set the channel topic: Slack channel for the image-builder project: https://github.com/kubernetes-sigs/image-builder
I'll kick it off from the thread I started in sig-cluster-lifecycle: could I get a review/approval for the image builder for windows ? If there is anything holding it back please let me know.
Happy to answer questions or address any feedback. Thanks!
I’m hoping at least @codenrhoden and/or @moshloop / @jdetiber reviews it, and maybe someone who is more familiar with the windows specific scripts than I am like @Kalya Subramanian or @Mark Rossetti
heads up I’m seeing the Azure 20.04 build fail in a few PRs with [0;32m vhd-ubuntu-2004: fatal: [default]: FAILED! => {"changed": false, "msg": "Failed to update apt cache: unknown reason"}[0m
The timing is strange but I don’t believe this is related to my recent job config update, it looks like a legitimate packer failure
nothing has changed AFAIK in the image-builder scripts so this might be an issue with either a new apt package or with a new base image for ubuntu 20.04
yay new channel! @cecile @moshloop just wanted to say I am sorry I still haven’t been active in GitHub doing reviews/development. I’ve still had some ongoing health problems, and I even spent yesterday getting a CT scan and scheduling a follow-up surgery for next Tuesday. 😞 Hopefully after that (plus a bit of healing time) things will return to normal.
No problem at all, please take care of yourself! Your health is top priority, everything else can wait. Good luck with the surgery, I hope everything comes back to normal soon for you 🤞
Should we cancel this week’s office hours due to Kubecon NA?
Gentle reminder: in the last bi-weekly call, we agreed to merge the Flatcar PR , and address remaining things afterwards incrementally. It would be great to see it merged. 🙂
Ready with the Windows PR as well. Thanks to everyone that has reviewed it and tried out out so far!
Hi folks, I and @moshloop will be giving a talk on "Deep dive: K8s Image builder" at Kubecon'20 for those who are attending.
Time: 2:55 PM EST / 11:55 PDT
Please approve the PR to fix to build ubuntu 20.04 CAPI image for OpenStack.
Simply updated to 20.04.1 and it's checksum since 20.04 ISO image checksum has gone.
We have found a few bugs with ubuntu and the current http proxy handeling and i've created an PR that fixes them 😄
In the quest to make our images build behind an http proxy ive stumbled across another annoyance:
Kubeadm image pull doesnt respect the proxy as its set right now.
Im unsure wht would be the best way to handle this. I guess the easiest way would just be to skip the pulling behind http proxies and then rely on having a working registry for actual bootstrap
Yes, you should be able to skip image pull with kubeadm, we use a custom role to pull images in a different manner
hmm, its correctly set for my CI, it just seems like packer is not passing it to the GOSS provisioner. Has any of you encountered that behaviour yet?
kubeadm is doing crictl pull, so a proxy setting on kubeadm itself should be ineffective
no, but i have hoped to avoid that 😄
As i need to modify the systemd service for that
looks like i have to grow my PR once more 😄
another way would be to pull the image locally, export it, copy to the machine, import it and then run kubeadm
seems like it is not possible to setup the goss provisioner so that the inital download support an proxy
It has env flags, but those arent in play at that stage 😕
The best workaround that i can currently see it to fetch the goss binary in ansible and then set the provisioner to skip the download and to use the prefetched binary
In the same process ive noticed that packer fetches the ovf directly from the node and not from the vcenter like a lot of other toling does. And it doesnt fall back on the vcenter pull. Have you noticed the same @naadir or have i just a weird packer config running?
i was under the impression it functions as a 301 redirect unless you're using a content library?
pretty sure it proxies it. I can export ovf fine, altough i only have a connection to the vcenter and cant reach any of the hosts directly
if you download from the vcenter ui, you'll also get an error unless you accept the esxi tls cert for the host vcenter is telling the browser to download from
hmm, from the browser im getting an URI under cls/data/0e4b042c-45bd-4fea-aea1-38e1a5e2c81c/Ubuntu20.04template-1.vmdk on the vcenter
itsnt tho. its a packer-created vm that never got uploaded to a content lib.
And there isnt any configured for the cluster or vcenter anyway
well, i guess i need to file a bug/feature request with packer to get clarification
Merged. Going forward, if you are interested in becoming an image-builder reviewer and have been helping with PR reviews/made significant contributions to the project, please open a PR to add yourself to the reviewers list (note that you must already be a member of kubernetes-sigs).
Thanks to Travis for approving the Flatcar PR, but the CI failed , and I cannot say /retest. Can anyone with privilege please give it /ok-to-test?
Woohoo! Thanks for hanging in there @dongsupark and seeing this all the way in. It was a lot of work! Conrats!
Thank you so much for taking care of it, even during your health recovery. 🙂
@Arunkumar Venkataramanan (DeepBrainz AI) has joined the channel
@dongsupark I proposed PR Flatcar related. Please review
is it possible to launch the images built by image-builder without CAPI? If so, how do you get the kubeconfig?
You can build VMs using those images on any infrastructure/cloud provider. Are you asking about building k8s clusters though?
yeah, so I built an AMI (that I eventually plan to add some extra provisioners to install our software via helm charts). I can deploy it via CAPI but I'm wondering if we could also share that AMI with a customer so they could just run it in there account.
I’m no AWS expert but I’m sure pretty sure there are ways to share AMIs with other users, @naadir might be able to help
yeah, that part isn't an issue. It's more, does that AMI have to be deployed via CAPI. When I deployed the AMI manually and SSH'd in, it didn't seem to have a valid kubeconfig to be able to run kubectl
if all you want to do is have a single node cluster without cluster api involved, you can run kubeadm init on that machine
The AMI itself is just an OS image with preinstalled components and container images. In order to bootstrap the vm into a Kubernetes cluster you need some mechanism such as CAPI (which uses kubeadm underne) or kubeadm to run
Thanks. I just found the kubeadm init stuff and have just ran that now.
Folks
, any method to estimate how much resource should be planned for image runtime require ment
I started work for adding goss validation for windows. It required some updates to the goss provisioner. Is there someone to give a ping for a review?
dive is packaged as a debian and rpm, you can look at the documentation on how to include additional Debian or rpm packages
Hi Folx, i would like to hear your opinion on this:
We have this internally and i think it might be useful for other folx as well, especially in the dockerhub ratelimit days 🙂
i can try joining the call today to chat about
@codenrhoden has capv shipped with any ubuntu 20.04 image yet? at least officially
Nothing I’ve ever pushed. We really should start doing that. 🙂 Nothing stopping it from happening.
yeah, but that might be the best point to inbtroduce efi builds as @naadir wanted to do a few months ago 😄
hmm, we added ovftool as dependency for vsphere builds, but havent updated the dockerfile yet
And the download is behind a registration wall, so much for "just adding it to the image"
I have some capacity spinning up soon and plan to tackle this + ova python script in the cli
the EULA of ovatool makes shipping it in the container imposiible on first glance. Am i right @codenrhoden?
For now, i've settled for this:
ENV LC_CTYPE=POSIXWhich requires you to download the bumdle on your own beforehand
ENV OVFTOOL_FILENAME=VMware-ovftool-4.4.1-16812187-lin.x86_64.bundle
ADD $OVFTOOL_FILENAME /tmp/
RUN /bin/sh /tmp/$OVFTOOL_FILENAME --console --required --eulas-agreed && <br> rm -f /tmp/$OVFTOOL_FILENAME
Hi @Maximilian Rink - did you encounter something like this -
And am i the only one that noticed the the OVAs dont have the systemd-timesyncd service enabled?
Hmm, I see extrarepos and I see extradebs, but I don't see a way to add extra keys. 😞
Docker build is failing with the following:
#16 132.9 hack/ensure-ovftool.sh
#16 132.9 ovftool must be present to build OVAs. If already installed
#16 132.9 make sure to add it to the PATH env var. If not installed, please
#16 132.9 install latest from .
#16 132.9 make: * [Makefile:90: deps-ova] Error 1
------
executor failed running [/bin/sh -c make deps]: exit code: 2
make: * [Makefile:592: docker-build] Error 1
Thanks Fabrizio. I still haven’t been able to move past the issues I was showing before. I’m putting the config file in place as expected, but containerd status doesn’t show that the change has been picked up. I probably need to pair with someone early next week to work through it. I know this needs to happen.
No luck on my side with this. same issues with the config change not taking effect. I’d like to pair with @neolit123 when possible to figure it out.
it looks like there are Prow flakes happening with regards to networking
we’ll have to see if we can find a way to do some sort of Windows CI.
We’ve been working on getting the OVA stuff in place. Still running into basic issues with Packer over a VPN connection between Prow and the cloud provider. Working on it though!
@codenrhoden @cecile I opened a pr to mitigate the failures we were seeing:
hi, I am making changes to image-builder to support RHEL8 for vsphere ISO and I can send a PR for it. The VMDK is created and in the step of creating OVF and OVA, images/capi/hack/image-build-ova.py expects to have an entry for rhel8-64 in OSidmap which is not present. I do not know the OSID and version which is needed by vmware. Can someone please help with this info?
OSidmap = {“vmware-photon-64”: {“id”: “36”, “version”: “”, “type”: “vmwarePhoton64Guest”},
“centos7-64": {“id”: “107", “version”: “7", “type”: “centos7-64"},
“rhel7-64”: {“id”: “80”, “version”: “7”, “type”: “rhel764guest”},
“ubuntu-64": {“id”: “94", “version”: “”, “type”: “ubuntu64Guest”},
“Windows2019Server-64”: {“id”: “112”, “version”: “”, “type”: “windows9srv-64”},
“Windows2004Server-64": {“id”: “112", “version”: “”, “type”: “windows9srv-64"}}
for the time being I added this line, script was able to move ahead, but I would like to know the correct id/version/type for rhel8
“rhel8-64”: {“id”: “80”, “version”: “8”, “type”: “rhel864guest”},
we should propably set for the OVAs, right?
👋 I'm trying to add Flatcar support to image-builder for Azure and I have some questions:
We use the VHD's for cluster-api-azure for the sample images since sigs are do not allow public images. If there is limitations I believe it would be find to do only sigs but at this we wouldn't be able to add automated tests to cluster-api-azure I think.
We use the VHD's for cluster-api-azure for the sample images since sigs are do not allow public images. If there is limitations I believe it would be find to do only sigs but at this we wouldn't be able to add automated tests to cluster-api-azure I think.Thanks, that make sense. I'll work on making VHD's to work.
Has anyone else noticed any issues with recent capi image-builder images related to cloud-config and cloud-final services not running on Ubuntu 18.04 with recent builds?
well, I'm not quite sure if it's an issue of my own making with the raw image builder PR 🙂
Didn’t run into any issues when we built new k8s images this week for azure
Yeah, I figured out the issue, since I'm trying to use the ec2 datasource in an unknown environment I needed to lay down a config file in /etc/cloud-ds-identify.cfg so that the systemd generator would do the right thing...
Hello guys 👋, I’m trying to do a capi image build for aws ami. It looks like the ansible errors out saying it needs to be root to perform yum commands on the remote ec2-instance. The interesting part is the build works completely fine on mac (local) and also on an ec2-instance running amazon linux 2. It only errors out when run on amazon linux 2 container.
I created this issue as well with logs -
I’d appreciate it, if anyone has ideas to go about debugging this.
Hey I’m curious if others are using Image Builder like I am, or I’m doing this in a “bad” way. Currently I’ve got a repo which has our ansible customizations and has image-builder as a git submodule (pinned to a specific tag), my CI pipeline copies the ansible customization into the ansible directory (since they aren’t open source/public) and i’m using the makefile based on the image builder book.
Instead of using it as a git suubmodule, could use use the published container image and bind mount in your customizations?
I could, that makes sense, I didn’t realize that was an option, I’ll lookup the images and start using them.
ok yeah that will work perfectly, just need to copy around some files in the CI pipeline because I can’t chose where my repo is mounted
https://calendar.google.com/calendar/u/0/r/week/2021/4/8?eid=Y3I5dHJwbmVucjVvcWEybXBnO[…]0MDhUMTUwMDAwWiBjYWxlbmRhckBrdWJlcm5ldGVzLmlv&pli=1&sf=true Is a link I think
@Kubernetes Moderator Service has joined the channel
@Kubernetes Moderator Service has joined the channel
@codenrhoden any objection to a hack/serve.py script just so we have a canonical example of how to serve artifacts on a url in image-builder?
I think that could be useful, yeah. If people are really enterprising, it can also be a way to speed up local builds that are purely upstream. I image it ends up just being a wrapper around the python http serve module?
yeah , exactly, not anything special but something to make bikeshedding less dangerous bc theres a canonical implementation to borrow
want to make clear my intentions here :) goal not to solve all problems but just to say "hey, heres an example" so we can get away from abstract descriptions of something that can be done in like 4 lines of code :)
(it basically is a way for anyone downstream to have a minimal impl of an artifact server) so that theres an end to end example upstream of how to use image builder w/ custom inputs
Is there a way to run the image builder on an existing machine? I have a need to add a baremetal server to a capv cluster and need to prepare the node. Has any done something like this or any guidance on how to extract the relevant parts of image builder in order to prepare the node correctly?
@jsturtevant sorry it was yours!! 😄 this has some notes on running ansible on its own
I see that in the url.yaml ansible task for downloading k8s binaries no retry logic has been added however for other downloads like for k8s images there is retry logic. Is there a reason for this? Would this be possible to add. If this is something that others would find helpful i can create a pr with the suggested change
i dont think there is any particular logic about that it is likely just who made the tasks, maybe put an issue in to track this, it seems like a sensible ask to me
https://github.com/kubernetes-sigs/image-builder/issues/595
For vsphere images, is there a packer config for ova disk size? The templates are creating with a 20Gi disk. I can't find where to increase that.
The base image is just for the ova, when actual clusters are stood up you can change the cloned images disk size
ok, that's what I figured but am not seeing something correctly with the provision. have set disk size in cluster config yaml, but seems not to take. probable user error.
in the cluster config yam via VSphereMachineTemplate:
549-kind: VSphereMachineTemplate
1578-metadata:
1588- name: dev-cluster
1608- namespace: dev
1625-spec:
1631- template:
1643- spec:
1653- cloneMode: linkedClone
1682- datacenter: natelab
1708- datastore: esxi-local-1
1738: diskGiB: 75
that was my initial thought, but couldn't figure out how to set it with image builder. but i am only using the available base and clone images and adding a deb with image builder.
are there any plans to allow a template file to pull in modules/templates? sort of like a kustomization patch pattern.
i suppose I was looking at the template purpose from the wrong angle. use template for bare min cni and csi, then use post provisioning stage with kustomize for the rest.
For things like cni and csi I would look at cluster resource sets or something like kapp-controller
ah, yes, that is what I'm using. I didn't post this in the right channel. meant to post to base cluster-api.
registry.config_path
looks promising and should make exposing the mirror settings easier.
Is there a way to disable exports of the ova file with image-builder? Image builder is the preferred way to build images for things like TKG downstream but in that scenario there's actually no need to have a .ova file created. The template on the vsphere cluster that packer creates is sufficient. But i cannot for the ilfe of me figure out a way to overload the export config with a later .json file. Any ideas?
This is pretty much a limitation with out we have our Packer config files structured at the moment. But I am in full agreement with you:
fyi on some potential release changes in packer goss provisioner @codenrhoden
I’m trying to build an OVA in image-builder from a CICD pipeline and it’s giving me an error saying python3 isn’t in the path. But here’s output showing that’s not necessarily true.
User's Python3 binary directory must be in $PATH
Location of package is:
Location: /root/.local/lib/python3.8/site-packages
$PATH is currently: /root/.local/lib/python3.8/site-packages:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
hmm, what is the difference between raw and qemu build targets?
as far as i can see its only the disk image format. anything im missing here?
yeah, ive come across that building images fore bare metal and as ironic supports cqow ive left it at qemu for now. Saves my builder a lot of space 😄
in general, im going to put up a pr for both builders tomorrow, bringiong goss and the additional commands from opther builders to this builders as well.
they worked fine for me and i see no reason to diverge here 😄 correct me iof im wrong @dan @jdetiber
Not sure if this is the right place to ask, but it looks like the documentation for building your own custom OVA images to use with TKG is out of date with latest v1.20.5 at the least the directory structure and instructions no longer match up
Ah, I did try subbing in 1.3 but maybe it should have been 1.3.1. Let me take a look at the docs again, shame it doesn’t have a drop down to jump to specific version
Yes, this is MUCH easier! I’m able to build a default image now
Do you happen to know where the packer definitions are within container image to specify which ISO to use?
the files there are:
centos-7.json
esx.json
linux
OWNERS
packer-common.json
packer-haproxy.json
packer-node.json
packer-windows.json
photon-3.json
rhel-7.json
ubuntu-1804.json
ubuntu-2004.json
vmx.json
vsphere.json
windows
windows-2004.json
windows-2019.json
@William Lam All the JSON files in the container image are just copied straight from the repo, so it has all the same default whether you are using a container or not. All of those variables can be overriden via flags or your additional custom JSON file so you don't have to go in and edit files within the container.
Yup, figured it out with some pointers from @vrabbi
Hello all 👋
I might be missing a point, can anybody rubberduck me with my attempts to build an azure SIG image?
I deployed a VM on azure, installed all the deps and I am trying to run make build-azure-sig-ubuntu-1804
I am getting this error
Error initializing core: error interpolating default value for 'crictl_url': template: root:1:66: executing "root" atWhat am I missing? Thankscrictl_version...>: error calling user: test</pre>But the crictl_version var is defined in one of the packer files:<br><pre>$ grep "crictl_version" packer/azure/**.json<br>packer/azure/packer.json: "crictl_url": "<a href='https://github.com/kubernetes-sigs/cri-tools/releases/download/v{{user'></a>crictl_version }}/crictl-v{{usercrictl_version`}}-linux-amd64.tar.gz",
packer/azure/packer.json: "crictl_version": "1.21.0",
@cecile @jsturtevant I know it's last minute, but I'm not going to be able to make office hours this week (tomorrow morning). Attendance has been pretty low, but I'm hesitant to just cancel it. My plan is just to write in the doc that I won't be there, but am hoping any discussion makes its way into notes. Does that seem reasonable?
let’s see if anyone else has agenda items by EOD and if not we can just cancel
Doesn't look like we have an agenda. Should we cancel for today then?
btw, what are your thought around systemd-boot? together with systemd248+ it is possible to get working discencryption with TPM autounlock that isnt a hassle to maintain. Im currently looking into that for our bare metal nodes
using it where the distro doesn't support it might be problematic in terms of support from the vendor
but tbh, systemd-boot has prooven more stable in our env than grub2 and configuration is sooo much easier
i went to Lennart's talk on systemd-boot a few years ago at Fosdem and was mostly convinced then
that said, @Patrick Daigle, would it be possible to ask Canonical if they would support Ubuntu images which use systemd-boot as the bootloader? Getting full disk encryption for free would be a bonus
No promises, but I can try bringing this up in our ongoing conversations and touch points.
hey folks, if im not missreading this we are assuming that open-vm-tools is a base package in our goss checks
and
that is messing up my bare metal images, that obviously dont have that tooling.
is there any other provider requireing open-vm-tools besides vmware?
If not im ammending my PR for qemu goss tests to remove that from defaults
@cecile or @codenrhoden can you reopen ? Feel like its
better for someone who knows the image-builder stuff to quickly make that PR and merge as needed.
I dont mind doing it but, its an OWNERS file so probably is something that can be done w/ a quick handshake by the regular imagebuilder owners :)
{% if kubernetes_semver is version('v1.21.0', '>=') %}
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
{% endif %}Is there a reason the SystemdCgroup section requires 1.21?
Just became the default in 1.21 so it was easier than making sure the underlying machine image was configured accordingly
Makes sense. I think we are fine without it under 1.20. We will be on 1.21 within a few months is my guess anyways. Thank you!!!
yes, but you /really/ need it for 1.21, as otherwise you wont be able to bootstrap a node with the kubeadm settings capi uses for 1.21
Hello! I'm might be missing something in the build flow and it would be awesome if someone can point me to the right direction.
I run make quick-release in order to build my changes, but I don't see a kubelet image despite I have modified code there
~/g/s/k/kubernetes (px-translation-library) [1]> ls -la _output/release-images/amd64/It looks like the kubelet has been built according to logs, but I can't find an output that I could load into a docker container and deploy later
total 422624
drwxr-xr-x 2 oksana oksana 4096 Jun 25 12:00 .
drwxr-xr-x 3 oksana oksana 4096 Jun 25 12:00 ..
-rw------- 2 oksana oksana 126286848 Jun 25 12:00 kube-apiserver.tar
-rw------- 2 oksana oksana 120957952 Jun 25 12:00 kube-controller-manager.tar
-rw------- 2 oksana oksana 133050368 Jun 25 12:00 kube-proxy.tar
-rw------- 2 oksana oksana 52460544 Jun 25 12:00 kube-scheduler.tar
oksana@dev-onaumov ~/g/s/k/kubernetes (px-translation-library)>
+++ [0625 11:55:37] Building go targets for linux/amd64:
cmd/kube-proxy
cmd/kube-apiserver
cmd/kube-controller-manager
cmd/kubelet
cmd/kubeadm
cmd/kube-scheduler
uhm, the image-builder doesnt have quick-release target, have you maybe picked the wrong channel?
How do you go about making sure the images are up to date when in production? I’d assume I’d want to routinely rebuild the images with a unique name (include date most likely) and update my CAPI machines to use them. If so, is there a way to add a suffix to the tmeplate name?
Basically yes. In practice, I think it will be slightly different for each provider and they way they are distributed.
@jackfrancis has been experimenting with a way to keep nodes fresh in a long running cluster: . It is an interesting approach you might checkout for ideas
There's a content library for updated images of you're using the built-in kubernetes
hmm, we are currently not cleaning up the netplan folder after we build images. Is there a specific reason for that or is it just oversight 😄
hmm, is there an ansible module that can do that? from the man page --vacum** only cleans up archived data from journald
though you've not lived until you've run COM inside .NET inside Ruby inside Java to interact with Windows logging.
uhm, my worst sinn actually is realtime programming in windows 😄
Running a Soft-PLC on an Intel Atom with one core to run a pharmaceutical machine is pain
anyway, Poettering said in 2015
The journal will not be bus enabled any time soon, as dbus-daemon logs to it, and this would hence mean a cyclic dependency where dbus daemon logs to journald, and journald uses dbus-daemon's IPC services... This can be fixed only when we have kdbus where the whole broken idea of userspace IPC is gone...
📣 Hi all, since the last couple of image-builder office hours have had very low attendance and no agenda topics we decided to try something new. We’ll add the agenda entry to the notes a week before the next meeting so folks can add topics throughout the week (just added the one for next week in Image Builder Office Hours - Google Docs) and if there are no topics added by the night before the meeting we’ll cancel the meeting occurrence. Let’s try this out a couple of times and see how it works out.
@codenrhoden @naadir the ova ci is broken as credentials got rolled
it's concerning that those could get rotated without me knowing about it... Unfortnately I can't really dig into this right now (and of course it's like 1 week after we got it all turned on and working), so I may have to just turn it off for now.
I sure wish the /override command worked. 🙂 I can open a PR to make this OVA CI optional later tonight, but have to run out the door right now. kid's tee ball game
Hello all, I’m trying to run goss validate on an image built by image-builder with overrides manually, and I seem to be missing something. Packer running goss at end of run seems to work fine, but when I try to run goss validate with all the variables filled in, it seems to fail with index of nil pointer error, which I’m pretty sure is my mistake somewhere. Appreciate any help here. I filled in the goss-vars.yaml as well.
root@vignesh-6wdxg:~/image-builder/images/capi/packer# sudo goss -g goss/goss.yaml --vars goss/goss-vars.yaml --vars-inline '{"ARCH":"amd64","OS":"Ubuntu","PROVIDER":"ova"}' validate
Error: could not read json data in goss/goss-command.yaml: template: test:61:24: executing "test" at : error calling index: index of nil pointer
I made it work providing all the overrides through --vars-inline, but would be interesting to know why populating the goss-vars.yaml file won’t work.
https://github.com/containerd/containerd/releases/tag/v1.5.4
We should probably update contained 😅
hmm, im also seeing some GOVC import errors on new OVAs created by image builder
Manual imports through the UI are fine tho, so i dont know what is going on here
Hmm, this is so weird, it seems to be a ci-only thing.
from my local machine /everything/ works, altough im literally just tunneling all traffic through the machine that runs the CI jobs
Hi all, added an agenda entry for this Thursday’s office hours in Image Builder Office Hours - Google Docs - please add your topics items before Wednesday EOD
Cancelling tomorrow’s meeting due to no agenda items
I’m (probably) going to need to load an internal root certificate into my images for CAPV. Is this possible with Image Builder?
yes, impliment your role that does that and set the custom role var
i'm getting lots of DMs with folk wanting . tbh, i think it's a bit suspect but seems like loads of orgs deploy .local domains
This would be great! I know .local shouldnt be used but i bump into .local domains all the time and in k8s without this workaround its an absolute nightmare
it's merged @vrabbi but you'll need to enable it manually as it's technically a "leak", it's in the docs.
building an image now to test it out in my .local lab (i created a dedicated lab to test out .local domain issues with K8s)
@codenrhoden @naadir @jdetiber we were thinking about pushing some of our baremetal code for image building upstream
In addition to the packer based builder we also have debootstrap for installing ubuntu, which is significantly faster and produces slimmer images
pinging @Anusha Hegde , @Jamie Monserrate, @Dharmjit and @Shailesh Pant from our Edge team who might have some opinions
i've been interested in debootstrap and the rhel-derivative equivalent. I would start off with an issue describing what you want to do, and we'll need active maintainers etc...
Thanks @naadir for adding, This most definitely is interesting and like you mentioned an issue detailing the approach and proposal would be a great place to start collaborating 👍
Does anyone have agenda items / topics for tomorrow’s office hours? Image Builder Office Hours - Google Docs
I'm not seeing anything. Going to put the 'cancelled' banner in there I guess. 🙂
This is last error I'm getting:
Build 'sig-windows-2019' errored after 1 second 43 milliseconds: the Shared Gallery Image to which to publish the managed image version to does not exist in the resource group image-builder-e2e-ey01nu[0m
I am a bit confused as I don't know why it passed, I thought each image name needed to be unique
No agenda items in the doc this morning, wrote a note saying today's office hours are canceled. I've got a few things that probably make sense to add for next go around in 2 weeks.
@yoctozepto (Radosław Piliszek) has joined the channel
Hello all, Im trying to do a qemu image build of ubuntu 2004 and I’m getting stuck on waiting for ssh to become available Any pointers here would be helpful.
==> qemu: Waiting for SSH to become available...
2021/09/15 23:44:29 packer-builder-qemu plugin: [INFO] Attempting SSH connection to 127.0.0.1:2753...
2021/09/15 23:44:29 packer-builder-qemu plugin: [DEBUG] reconnecting to TCP connection for SSH
2021/09/15 23:44:29 packer-builder-qemu plugin: [DEBUG] handshaking with SSH
2021/09/15 23:45:03 packer-builder-qemu plugin: [DEBUG] SSH handshake err: ssh: handshake failed: read tcp 127.0.0.1:51640->127.0.0.1:2753: read: connection reset by peer
I need to install some trusted CA certificates into images for CAPI. These need to be for image repositories so it goes in directories specific to that repository address. Is there a built in task I can use for this or do I need to create a custom role?
Why do this inside the image? That seems to me like a bad idea. If you need to replace the certs its a pain. I find it much easier to use files directives in kubeadmconfigtemplate and in kubeadmcontrolplane objects. That also allows for an easy rolling update with a changed cert without needing to change the template
What’s the difference between putting it in kubeadmconfigtemplate and kubeadmcontrolplane? Is it one is focused on control plane nodes and the other is worker nodes?
And is this enough to get containerd to connect to an image repository whose certificate is signed by this CA?
I’m trying to get the magic incarnation to make this work. I had it working last night and can’t reproduce it.
If the ceet is in the OSes trusted CA certs then containerd should respect that
I’m new to PKI and all this so the format stuff is a bit of black magic. Is there a way to see which format something is encoded?
Why are you installing it in the /etc directory structure then moving it later? Is it support for Proton?
I may have it working with your configuration. I’m not sure why it didn’t work when I was doing it manually but it’s possible I “tied a knot” earlier in the process that prevented it from working.
Is the process the same for Ubuntu and Photon? I'm trying Photon and it's not injecting the kickstart file
- PACKER_VAR_FILES=/opt/image-builder/images/capi/config.json make build-node-ova-vsphere-photon-3That’s my build line
im away travelling for 2 weeks and dont have my environment readily available to take a look. if you dont figure it out by then i will try and reproduce and see if i can figure out why its failing for you
Not sure if you’re back but I still haven’t figured out the Proton issue.
besides the local builds, is there a rule/best practice to bring some of the targets using docker?
I have been struggling with the make deps-ova setup, in a lot of cases the python/deps gets a mess
I've seen some instances where our dependency scripts have tried to install things in non-ideal locations before, it doesn't hurt to run it containerized if that is the case with the deps-ova target
can someone help with a basic question . i was able to deploy my own custom AMI with cluster-api image-builder . However that doesnt show up in "clusterawsadm ami list" . Let me know how i can use my own custom AMI while deploying a cluster on AWS using the cluster-api . i have been searching for pointers in the documentation but couldnt find any
am sure this is something basic
thanks for the pointer, i see the spec has the following
spec:
template:
spec:
iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
instanceType: t3.large
sshKeyName: #####
Apologies, for folks that are finding this and haven't figured it out:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
...
spec:
template:
spec:
...
ami:
id: ami-012345678910
Regarding the CAPI image: make build-qemu-ubuntu-2004 currently fails for me because no matching host key type found (full error message in a reply).
I think this is related to the latest OpenSSH release 8.8 () which disabled ssh-rsa with sha1 keys.
When I set the options mentionend in the OpenSSH release notes (ANSIBLESSHARGS="-oHostKeyAlgorithms=+ssh-rsa -oPubkeyAcceptedAlgorithms=+ssh-rsa" make [...]) it works again.
Should I open a ticket for this? I think we have to options here:
a) Update the Key (images/capi/cloudinit/{idrsa.capi.pub,user-data})
b) Add these options to the default ANSIBLESSH_ARGS
Error with context:
==> qemu: Executing Ansible: ansible-playbook -e packer_build_name="qemu" -e packer_builder_type=qemu -e packer_http_addr=10.0.2.2:8482 --ssh-extra-args '-o IdentitiesOnly=yes' --extra-vars containerd_url= containerd_sha256=591e4e087ea2f5007e6c64deb382df58d419b7b6922eab45a1923d843d57615f pause_image=k8s.gcr.io/pause:3.4.1 containerd_additional_settings= containerd_cri_socket=/var/run/containerd/containerd.sock containerd_version=1.5.4 crictl_url= crictl_sha256=44d5f550ef3f41f9b53155906e0229ffdbee4b19452b4df540265e29572b899c crictl_source_type=pkg custom_role= custom_role_names= disable_public_repos=false extra_debs= extra_repos= extra_rpms= http_proxy= https_proxy= kubeadm_template=etc/kubeadm.yml kubernetes_cni_http_source= kubernetes_cni_http_checksum=sha256: kubernetes_http_source= kubernetes_container_registry=k8s.gcr.io kubernetes_rpm_repo= kubernetes_rpm_gpg_key=" " kubernetes_rpm_gpg_check=True kubernetes_deb_repo=" kubernetes-xenial" kubernetes_deb_gpg_key= kubernetes_cni_deb_version=0.8.7-00 kubernetes_cni_rpm_version=0.8.7-0 kubernetes_cni_semver=v0.8.7 kubernetes_cni_source_type=pkg kubernetes_semver=v1.22.2 kubernetes_source_type=pkg kubernetes_load_additional_imgs=false kubernetes_deb_version=1.22.2-00 kubernetes_rpm_version=1.20.9-0 no_proxy= python_path= redhat_epel_rpm= reenable_public_repos=true remove_extra_repos=false systemd_prefix=/usr/lib/systemd sysusr_prefix=/usr sysusrlocal_prefix=/usr/local load_additional_components=false additional_registry_images=false additional_registry_images_list= additional_url_images=false additional_url_images_list= additional_executables=false additional_executables_list= additional_executables_destination_path= --extra-vars ansible_python_interpreter=/usr/bin/python3 -e ansible_ssh_private_key_file=/tmp/ansible-key801906084 -i /tmp/packer-provisioner-ansible4185103635 /home/jt/git/image-builder/images/capi/ansible/node.yml
qemu:
qemu: PLAY [all] *
==> qemu: failed to handshake
qemu:
qemu: TASK [Gathering Facts]
qemu: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Unable to negotiate with 127.0.0.1 port 43119: no matching host key type found. Their offer: ssh-rsa", "unreachable": true}
qemu:
qemu: PLAY RECAP *
qemu: default : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
qemu:
==> qemu: Provisioning step had errors: Running the cleanup provisioner, if present...
==> qemu: Deleting output directory...
Build 'qemu' errored after 5 minutes 18 seconds: Error executing Ansible: Non-zero exit status: exit status 4
==> Wait completed after 5 minutes 18 seconds
==> Some builds didn't complete successfully and had errors:
--> qemu: Error executing Ansible: Non-zero exit status: exit status 4
==> Builds finished but no artifacts were created.
$ ssh -V
OpenSSH_8.8p1, OpenSSL 1.1.1l 24 Aug 2021
I only use the build-qemu-ubuntu-2004 target, so i can't tell if it is affecting other targets. Issue happens with latest master 5f3d1d6998a29ac1de63f0ae914bcb266a242078 for me.
@codenrhoden the ovas created by the vmware tool arent able to be imported into vcenter 6.5 via govc, only via UI
2021-10-04 22:39:57,899 | ERROR: [04-10-21 22:39:56] Warning: Line 142: Unable to parse 'flags.vvtdEnabled' for attribute 'key' on element 'Config'.I suspect its failing because 6.5 cant make sense of the NVRAM config in the OVA, altough the OVA is still targeting 6.5
[04-10-21 22:39:56] Warning: Line 143: Unable to parse 'flags.vbsEnabled' for attribute 'key' on element 'Config'.
govc: file does not exist
The complete image building happens on vmx13, we basically upgrade the VM after deploy to the max the vcenter supports and set props like uuid for the CSI to function properly via govc
We saw something similar where OVA built using tar had issues getting imported into the vC from the UI. Hence ovftool was introduced as an option to build the OVA. Do you have access to ovftool to try that out? Either way, it seems to be an issue.
I’m trying to build a Proton image using a setup that already does Ubuntu. But Proton starts and says the network is unreachable and it can’t get the kickstart file.
@Maximilian Rink Regarding the cloud-init bug, Just wanted to check if you are working on an PR based on akutz’s suggestions here
Hello,
I am trying to install image builder project in my local environment and I have ansible installed
srajashekar@srajashekar-a01 capi % pip3 show ansibleHowever, when I try to run make deps or make build-do-ubuntu-2004 to build an image, I am facing this error
Name: ansible
Version: 4.7.0
Summary: Radically simple IT automation
Home-page:
Author: Ansible, Inc.
Author-email: info@ansible.com
License: GPLv3+
Location: /Users/srajashekar/Library/Python/3.9/lib/python/site-packages
Requires: ansible-core
Required-by:
srajashekar@srajashekar-a01 capi %
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.make build-do-ubuntu-2004
ansible 4.7.0 requires ansible-core<2.12,>=2.11.6, but you have ansible-core 2.11.5 which is incompatible.
Successfully installed ansible-core-2.11.5
User's Python3 binary directory must be in $PATH
Location of package is:
ansible
Location: /Users/srajashekar/Library/Python/3.9/lib/python/site-packages
$PATH is currently: /usr/local/opt/mysql-client/bin:/usr/local/opt/gnu-sed/libexec/gnubin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Applications/VMware Fusion.app/Contents/Public:/usr/local/go/bin:/Users/srajashekar/bin::/Users/srajashekar/go/bin/ginkgo:/Users/srajashekar/Library/Python/3.9/lib/python/site-packages:/Users/srajashekar/wcp/image-builder/images/capi/.local/bin
make: * [deps-ami] Error 1
Requirement already satisfied: pycparser in /usr/local/lib/python3.9/site-packages (from cffi>=1.12->cryptography->ansible-core==2.11.5) (2.20)
User's Python3 binary directory must be in $PATH
Location of package is:
ansible
Location: /Users/srajashekar/Library/Python/3.9/lib/python/site-packages
$PATH is currently: /usr/local/opt/mysql-client/bin:/usr/local/opt/gnu-sed/libexec/gnubin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Applications/VMware Fusion.app/Contents/Public:/usr/local/go/bin:/Users/srajashekar/bin::/Users/srajashekar/go/bin/ginkgo:/Users/srajashekar/Library/Python/3.9/lib/python/site-packages:/Users/srajashekar/wcp/image-builder/images/capi/.local/bin
make: * [deps-do] Error 1
Check out the container image for image builder, it might help with this issue.
https://github.com/kubernetes-sigs/image-builder/releases/tag/v0.1.9
@voor when I try to run the make target for docker-build I’m getting this error
srajashekar@srajashekar-a01 capi % make docker-build
# We must pre-pull images
docker pull docker/dockerfile:1.1experimental
1.1experimental: Pulling from docker/dockerfile
612615616619: Pull complete
Digest: sha256:de85b2f3a3e8a2f7fe48e8e84a65f6fdd5cd5183afa6412fff9caa6871649c44
Status: Downloaded newer image for docker/dockerfile:1.1-experimental
docker.io/docker/dockerfile:1.1-experimental
docker pull docker.io/library/ubuntu:focal
focal: Pulling from library/ubuntu
7b1a6ab2e44d: Pull complete
Digest: sha256:626ffe58f6e7566e00254b638eb7e0f3b11d4da9675088f4781a50ae288f3322
Status: Downloaded newer image for ubuntu:focal
docker.io/library/ubuntu:focal
bash: line 1: gcloud: command not found
DOCKER_BUILDKIT=1 docker build --build-arg PASSED_IB_VERSION=v0.1.10-66-gdab2b88f-dirty --build-arg ARCH=amd64 --build-arg BASE_IMAGE=docker.io/library/ubuntu:focal . -t gcr.io//cluster-node-image-builder-amd64:dev
invalid argument "gcr.io//cluster-node-image-builder-amd64:dev" for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
make: ** [docker-build] Error 125
srajashekar@srajashekar-a01 capi %
You don't need to make the container it's already made at the repository url in the release notes
I am new to image-builder and ansible and any help to set my env will be appreciated!
No Agenda items are present for today's office hours, so I marked the meeting as canceled.
Hi 👋, I opened a PR to add support for building OpenStack qemu-kvm CAPI images using a container. I read through the contributor guide but likely still missed something. Please let me know if there is something I need to change. Thanks!
i'd love to see (set a default Containerd imports directory) merged and in a release soon. looks like all comments are addressed and the PR is ready to go. @jsturtevant @codenrhoden @Peri sorry to ping you directly but you've been helpful at reviewing the PR so far 🙏
No agenda items for today's office hours, so I marked it as canceled.
out of interest, what is the process now for☝️to be used in provider image builds?
sorry i'm a n00b in image-builder world... is there a doc or some reference i can read on the process? from what you said, i assume you mean there is a release of image-builder container and then individual providers can use it to build their respective images - is that right?
there's actually a 0.1.10 tag and container, I just never drafted the release notes.... But I agree, It's about time for a new one. There's a couple pending PRs that are pretty important, and once those are in I'll tag a new release.
@codenrhoden for what would be the best way to test this? use any of the latest images built from main?
@cecile that sounds right to me. Any recent images built with the Azure pipelines you have in place would have the latest cloud-init in them, and that cloud-init is buggy i a scenario triggered by CAPV. It doesn't appear to me that CAPZ images are effected, so we are leaving the cloud-init as-is. But yeah, if it turns out that CAPZ images are effected, you would know right away because the images wouldn't join into a cluster
@codenrhoden somewhat related: to we have end2end tests for actually standing up a CAPI cluster with the built images from CI on PRs to main?
If not: we have built something for Metal3 and CAPV internally, but on GitLab CI not on prow. We could port that over tho
so far it has been provider dependent. If I understand correctly, CAPZ does does test the images they they build with the Azure pipelines. I think CAPG does too with their nightly builds they've implemented. There is not ongoing E2E tests with up-to-date images for CAPV or CAPA that I am aware of, but I may be unaware. I would have to check with those projects.
we need to rearchitect the image builder for newer ubuntu versions somewhat, as they only support subiquity. Ive started a while back with but there are a few more changes that we will need to make for newer ubuntu versions going forward. The question is, do we want to add ubuntu 2110 to image builder to start ironing out bugs before 2204 hits or do we want to wait?
that's a really good question. Maybe one for office hours? I do remember your PR re: subiquity, and I know it's gotten zero attention. 😞 It's definitely been a case of only looking at things for the LTS releases thus far.
I feel like we've mostly stuck to LTS releases thus far, but shaking things out ahead of time would be beneficial. I think the question becomes, do we remove a release, like 21.10, after 22.04 becomes available?
Hi All 👋, I was going through the CAPI image builder code and noticed that base image for this dockerfile could use with an image bump (focal -> rolling). Would you all be open to a PR from me that does this bump?
PR welcome. Just wondering, why rolling vs. latest? Isn’t “latest” latest LTS?
I am okay with "latest" 🙂 I was assuming that, we would want to keep it to a specific release and update it as needed.
IIUC, this release specific tag, makes it possible potentially to create reproducible builds. But moving to "latest" will potentially mean the image tag is mutable and may result in different base image based on when the build was triggered for the same dockerfile.
looking at Ubuntu - Official Image | Docker Hub it looked to me like rolling was even less release specific: “The ubuntu:latest tag points to the “latest LTS”, since that’s the version recommended for general use. The ubuntu:rolling tag points to the latest release (regardless of LTS status).”
oh wow. I learnt something new today. Thank you @cecile. I will use latest in that case
Hi all, are there any agenda items for tomorrow’s office hours? I don’t see anything on the meeting notes currently
I didn't see any as of this morning, so I just marked it as canceled. Let me know if you want to change that. I've got a couple things brewing that I'll hopefully put on the agenda in two weeks.
Team, in the default images that are created using the image builder project , what are the typical firewall/iptable rules in the images? I could not find any reference in the code. In our case, we are having to disable firewall as otherwise worker node to API Server communication, node to node communication, node port services etc do not work.
Hi @Shyam P R most of those rules are not modified from whatever the upstream provided image is, so you can either modify them prior to image builder or afterwards.
We recently merged to make it easier to override containerd configuration. However, it turns out that containerd only merges configuration at the section level (). Just wanted to raise awareness of this.
can someone help me reviewing this pr - ?
No agenda items for office hours as of last night, marked it as canceled.
I will use that time to review the above PR, and the outstanding one for adding Rocky Linux to QEMU/Raw builders
I do think we should tag a v0.11. I think we are in a good spot for it, with some recent fixes going in for AMI+Amazon Linux 2, and some critical ones for OVAs (NTP fixes, Photon AppArmor).
Anyone know of anything pending that looks important? cc/ @Amim Knabben
we have @Peri updates on a Windows nodes timezone issue + the capability to install openssh from a URL source
Team, please review the PR to add Oracle Cloud Infrastructure(OCI) support in image-builder. Sorry this is my first PR, so apologies for any mistakes beforehand itself wrt to procedure/code.
Gentle reminder for this @codenrhoden @cecile @naadir @kiran keshavamurthy
Ack. I know Kiran was able to take a look yesterday. I should have time tomorrow. Sorry for the delay.
Gentle reminder for this review @codenrhoden @cecile @naadir @kiran keshavamurthy
Gentle reminder for the review @codenrhoden , the ova tests are still failing though, for all PRs
Thanks for the review @codenrhoden, @cecile it would be great to get your review also as your original approval was removed by an update
One more fix for the capi ansible code: , can someone help me reviewing this PR?
@codenrhoden It looks like OVA ci is failing consistently. It doesn't seem to be blocking test. Is this a known issue?
Failing with this error:
Build 'vsphere' errored after 2 minutes 9 seconds: Post "": dial tcp 54.70.161.229:443: connect: connection timed out
199
200
==> Wait completed after 2 minutes 9 seconds
201
202
==> Some builds didn't complete successfully and had errors:
203
--> vsphere: Post "": dial tcp 54.70.161.229:443: connect: connection timed out
I think it may have been blocked from internet access like many other vmc vcenters due to log4j cve. Not sure in this case but ive seen that in multiple vmc instances since the cve was announced
sorry I didn't respond here, been a busy few days for me. I had Kiran get this fixed up once we got the right networking and firewall settings in place.
I don't see any agenda items, so marked the office hours as canceled. I'll just throw out that I'd lake to tag v0.11 tomorrow. We've been talking about doing it for a while, and i was personally waiting until a recent Photon OVA issue was resolved, and it has been, so it seems like a good time. I know a few people have asked for it, and there's been a good number of Windows related fixes as well.
that's... weird. let me check that out. docs are supposed to be published after every merge
I'll have to play with it a bit. It looks like it's not configured correctly. It's link to inside of capi.md, but in the published page it's a 404. I think I see why. It would also be nice to just have WIndows show up in the side-bar nav. It's all in the configuration I think, the docs are getting published fine, so it should be an easy fix.
so it is publishing the latest docs but just the page is missing from the nav?
@jsturtevant PR to get those Windows docs to show up:
tag v0.1.11 is made. And I finally published release notes for v0.1.10. homer-disappear
Will make sure v0.1.11 notes are done today.
Facing issues while make build-qemu-centos-7 , any idea what is happening?
Adding following entry in the ansible.cfg helped me but its pretty slow
[defaults]
....
timeout = 120
I don't see anything obvious from the screen shot. Unfortunately I don't have any experience with the QEMU builder
n/m, may be because of some nested virtulization, performance is not that great, will try running on a baremetal and check how this goes!
I have one open PR that seems to be failing consistently on a not related job -
packer build -var-file="/home/prow/go/src/sigs.k8s.io/image-builder/images/capi/packer/config/kubernetes.json" -var-file="/home/prow/go/src/sigs.k8s.io/image-builder/images/capi/packer/config/cni.json" -var-file="/home/prow/go/src/sigs.k8s.io/image-builder/images/capi/packer/config/containerd.json" -var-file="/home/prow/go/src/sigs.k8s.io/image-builder/images/capi/packer/config/ansible-args.json" -var-file="/home/prow/go/src/sigs.k8s.io/image-builder/images/capi/packer/config/goss-args.json" -var-file="/home/prow/go/src/sigs.k8s.io/image-builder/images/capi/packer/config/common.json" -var-file="/home/prow/go/src/sigs.k8s.io/image-builder/images/capi/packer/config/additional_components.json" -color=true -var-file="packer/ova/packer-common.json" -var-file="/home/prow/go/src/sigs.k8s.io/image-builder/images/capi/packer/ova/photon-3.json" -var-file="packer/ova/vsphere.json" -except=esx -except=local -only=vsphere-clone -var-file="/home/prow/go/src/sigs.k8s.io/image-builder/images/capi/ci-photon-3.json" packer/ova/packer-node.json
vsphere-clone: output will be in this color.
==> vsphere-clone: Cloning VM...
Build 'vsphere-clone' errored after 11 seconds 801 milliseconds: Error finding network: path 'sddc-cgw-network-8' resolves to multiple networks
@codenrhoden Travis, can you PTAL again, cleaned up the open issues
@Amim Knabben (and anyone with PRs), we are definitely aware of the issues with OVA CI. I'm working to address the problem now (it's an infra issue). I'm hoping to have it resolved in the next day or so, but if it continues to be an issue we can consider making that CI test non-blocking. All OVA CI tests are going to fail right now.
No not really. I can write an issue up, though, so it can be referenced. Will be a little while before there is a PR. I'll write one now.
wow, you are right. it was for a while, but I completely forgot that it wasn't anymore.
I would have felt real silly if I went to go turn that on and found it was already there. Well, once this gets sorted out, I'll set it back. 😆
@codenrhoden the ova build seems to be failing, can we make it non blocking? sorry if it was already made that
They are currently non-blocking. And I will definitely get to this today.
Hi all, I’m going to start a PR to update the OWNERs file and:
let me know if there are any objections / considerations I should take into account before doing this. Also happy to split 3 from 1 and 2 if we want to approve the PRs separately.
cc @akutz @jdetiber @figo @justinsb @luxas @moshloop @timothysc (you are all listed as maintainers currently)
If there is an emeritus tag, I'll take it since I created the project along with @jdetiber, but I am no longer actively involved.
No agenda items for today's office hours. I marked it as canceled in the doc.
hi, I am new to image-builder. This might be a stupid question. I was ruuning make build-node-ova-local-photon-3 and encountered FileNotFoundError: [Errno 2] No such file or directory: 'vmware-vdiskmanager' I googled this error and people saying VMware Fusion would already have vmware-vdiskmanager built in. Anyone has any ideas to fix this? Thank you!
Hi @Yiyi Zhou! It's most likely that the folder that contains vmware-vdiskmanager is not in your PATH environment variable. On my Mac, VMware Fusion adds that executable in this folder:
$ which vmware-vdiskmanager
/Applications/VMware Fusion.app/Contents/Library/vmware-vdiskmanager
so you probably need to add /Applications/VMware Fusion.app/Contents/Library to your PATH
Thank you Travis! You are right. I added export PATH=$PATH:/Applications/VMware\ Fusion.app/Contents/Library and now it worked.
@codenrhoden I have another question. So I didn't change any files and ran make build-node-ova-local-ubuntu-2004 . I opened the vmx file in Fusion, expecting to login with default username(builder) and password(builder) in packer-common.json. But it didn't work.
I tried appending the key to the ~/.ssh/authorized_keys file locally, still unable to login.
Hi Yiyi. Since these images are intended for use with CAPI, they are setup to expect an interaction that CAPI automatically performs. Namely, this is the injection of cloud-init metadata to create a user and add an SSH key.
The best way to do about this when working with Fusion is to run the hack/image-post-create-config.sh script. The way I do it is to import the OVA into Fusion, but do not start the VM. Before you start the VM, run the above script, which will create the capv user and add the SSH key that is present in the image-builder repo. Then start the VM and you can SSH with something like ssh -i cloudinit/id_rsa.capi capv@IP.
I just took a look at the docs, and I'm noticing that is out of date. 😞 It definitely needs to be corrected
FWIW Travis the issue was her local copy of the key lacked the correct file permissions, and SSH was balking at her.
I pointed her to the hack/image-ssh.sh script, which should perhaps automatically apply the correct perms to the key in the examples area.
FYI, the builder user is definitely locked and can't be used. If you are booting the VMX directly (instead of importing the OVA first), you may be able to SSH into it by grabbing the IP from fusion, using the capv user, and pointing to the SSH key found in the cloudinit user, but I haven't done it that way in a while. But that set of files is injected with the metadata automatically here:
And yeah, the builder user gets locked as part of the shutdown command.
I am not working on upstream at the moment. @codenrhoden @Sanika Gawhane, can we get this sorted please?
@codenrhoden @Sanika Gawhane let me know if any meeting required to discuss this?
There is a new issue with WS2022 build in the Azure VHD job. It is an OS packaging issue and I don't have a workaround as of now so will open a PR shortly to disable the job until it is resolved.
@jsturtevant If you like, me or @kiran keshavamurthy could open the PR to set required: false just to unblock things.
sorry was in sig meetings but I don't think we should disable all the tests just ws 2022:
OVA CI is working again. Finally!
No agenda items for this morning's office hours - marked it as canceled
📣 📣 📣 hello, a couple of message on behalf of the SIG Cluster Lifecycle leads
cc @cecile @codenrhoden (from the OWNERS file of image-builder)
@codenrhoden @kiran keshavamurthy @jsturtevant should we jump on a quick call at some point and do the annual report together?
I’m going to be oof end of next week, could we do this tomorrow or Friday?
I'm pretty free today as well. My group has been trying out "no meeting Fridays", so that works!
no worries I did it for capz and capi already it’s pretty straightforward we can go through it together
@jsturtevant @kiran keshavamurthy are y’all available?
Team, is there a way to skip installing open-vm-tools using image-builder? We are trying to use image builder in Oracle Linux 8 and in Arm architecture, and open-vm-tools is not present which is causing the build to fail.
Team, any reason we are not bumping the kubernetes version here - to something new? Also I dont see in customisation we have defined how to update kubernetes version - , is that something we should try to add?
From the book page that you linked:
The version of Kubernetes to install. The default version is kept at n-2.
Thanks @Apricote, looks like thes section si also misleading
See Customization section below for overriding this valueas the customisation section does not explain the kubernetes properties, If no on else picks it up, I will try to create a PR tomorrow
Can I please get an ok-to-test for the PR to bump up kubernetes version to n-2 and minor doc fixes?
No agenda items for today's office hours. Marked it as canceled. Will plan to take a look at the above PRs during that time.
Hello all, I’ve been using image-builder for a while now. I was exploring the cli for image-builder in the repo and It looks like it’s not under active development. Is this still on the roadmap? I can help contribute here if this is still a direction the community wants to pursue.
yeah, it's definitely true that the CLI isn't getting any attention at the moment. And has been for some time. 😞 There is no current agreed-upoon design of how to approach it, and it's been well over a year since it was even discussed.
For me and a team of engineers I work with, we've been in a state of "yeah, we will have resources to work on that in a month or two" for well over a year now. And honestly that is still the case -- something we still think we want to do, but just don't have the bandwidth/resources to look at it.
ah, no worries. I will take a look at the repo and see if I can hash out a design when I get sometime. We can talk about it/discuss more in the office hours whenever I get some direction 🙂 Thanks for the info though.
Team, while doing make deps in an Ubuntu box, I get the following
Installing collected packages: wheel, pipbut the documentation only talks about add the following in PATH
WARNING: The script wheel is installed in '/home/ubuntu/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The scripts pip, pip3 and pip3.8 are installed in '/home/ubuntu/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
export PATH=$PWD/.bin:$PATHLooks like both PWD and $HOME/.local/bin has to be on path. Is it OK if I create a PR to add both directories to path?
Seems like it would be fine to me. I wonder if this is some vestige of the docs being written when everything was on Python 2.7 and now things are fully moved over to Python 3.x
@codenrhoden I see you have done lgtm label but not approved for this, any reason? Sorry for bugging you
sometimes it works out that way anyways, but I at least like to give some time for someone else to look
yeah that is great, more eyes the better, I am new to this 🙂, hence wanted to check.
Team, can we get another set of eyes on this PR please, we have go the review from @codenrhoden
Thanks @jsturtevant, can I get an approve if thats ok from you or @codenrhoden, I will add the comment in the next Pr I have to raise to fix the thread -
Sorry to bother again for this @codenrhoden @jsturtevant but I had to rebase the PR for the ova ci job fix, updated the missing comment as well. Please review
Hello team, does anyone have/know some sort of hardening that you use on top of the image-builder image? I found some generic OS hardening like dev-sec repo on GitHub, but I’m curious if something is standard across the k8s community.
I don't know of anything standard at the moment. I know of some vendors that use their own additional Ansible roles for hardening, but I don't know what the status of those becoming public is.
got it, thanks. Do you mind sharing some links if you have them handy? If not, its fine 🙂
Everything I am aware of is non-public right now. I have good reason to believe that may change within a few months, but who knows when it comes to deadlines and releases. They always move around. 🙂
There were no items in the office-hours agenda, so I marked it as canceled for today.
Team, can someone with the vsphere image builder job knowledge check why the latest builds of pull-ova-all are failing(eg: ), it is failing for all the PRs, so don't think it is PR specific. The error is
[0;32m vsphere-clone: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to create temporary directory.In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \"echo /tmp/.ansible\"&& mkdir \"echo /tmp/.ansible/ansible-tmp-1647239786.000533-1078-173893619533102\" && echo ansible-tmp-1647239786.000533-1078-173893619533102=\"echo /tmp/.ansible/ansible-tmp-1647239786.000533-1078-173893619533102\" ), exited with result 1", "unreachable": true}[0mTIA for the help.
I randomly have errors with building custom windows 2019 AWS AMIs (using image-builder, pre pulling a few docker containers). It sometimes works and sometimes doesn't.
Issue seems to be the sysprep step in the end (executes C:/ProgramData/Amazon/EC2-Windows/Launch/Scripts/SysprepInstance.ps1). I'm looking into debugging this, but maybe anyone knows right away what that could be?
==> windows-2019: Provisioning with powershell script: packer/ami/scripts/sysprep_prerequisites.ps1
windows-2019: Removing default unattend.xml file...
windows-2019:
windows-2019: TaskPath TaskName State
windows-2019: -------- -------- -----
windows-2019: \ Amazon Ec2 Launch - Instance I... Ready
windows-2019: Using cloudbase-init unattend file for sysprep: C:\Program Files\Cloudbase Solutions\Cloudbase-Init\conf\Unattend.xml
==> windows-2019: Provisioning with Powershell...
==> windows-2019: Provisioning with powershell script: /tmp/powershell-provisioner372200148
windows-2019:
windows-2019: C:\Users\Administrator>reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server" /v fDenyTSConnections /t REG_DWORD /d 1 /f
windows-2019: The operation completed successfully.
==> windows-2019: Provisioning step had errors: Running the cleanup provisioner, if present...
==> windows-2019: Terminating the source AWS instance...
unfortunately not. There's a sysprep troubleshoot doc from AWS which might help, but requires connecting to the machine. Will try that on Monday.
troubling shooting sysprep is tough, sounds like you have a way to do it good luck!
thanks 🙂 funny enough I tried it several times yesterday and today and all of them worked. Fingers crossed that this continues 😉
@cpanato please see ( cc @codenrhoden and @cecile )
Damn you’re too fast for me. I just opened a branch for it. 😅
@Batuhan Apaydın (developer-guy) has joined the channel
I see there are agenda items for this week's office hours. I've been on PTO this week, and won't be able to attend. I will be back online later today, though, so I'll look through notes and check out what was discussed.
with return to office, the 8am meeting time doesn’t work well for me anymore as it’s right in the middle of my commute so I likely won’t be able to attend most weeks.
I think it's time to consider rescheduling this to a time that isn't at 8am for PDT folks. Our attendance is basically nil at this point. Though I am really appreciative of people leaving notes in the doc last week when no maintainers attended.
There are no agenda items this week, so I'm going to mark the meeting as canceled. Though there was an open through about what's up with the CLI somewhere, I'll track that down and respond to it here.
No maintainers attended @codenrhoden 🙂 There are some questions/requests to the maintainers in the notes. Enjoy your PTO!
Hi folks! I'm trying to get merged as part of our effort to ensure Flatcar images are aligned with the rest of the supported distros in terms of GOSS validations (right now some Flatcar builds are either broken due to GOSS errors or unvalidated due to "special" GOSS configuration which diverges from the generic one). This is a small PR (around 40 LOC changed) which ensures Flatcar has a similar GOSS config to the rest of the distros. The PR is blocking other work I'm doing in both image-builder and CAPV (because we need working vSphere images to test CAPV changes which we currently don't have).
@jsturtevant @moshloop looks like you've been automatically tagged as reviewers. Is there anything I can do to make it easier for you folks to review the PR? Alternatively, who could I tag for a review if both of you are busy these days? Thanks! 🙏
@jlieb To answer your question about the CLI and it not compiling (which you fixed, thanks!), it is not currently in use. There were some plans around using it to do a few things, mainly as a pre-processor to generate the Packer configs (which would allow us to have much better logic ((and tests!)) around what the Packer configs look like. There are currently no design docs around it, and the work has no volunteers. The individual who was taking it up moved on to other responsibilities. So as you say, it hadn't been touched in quite some time.
I've been saying for at least a year that I thought I had some team members that may be able to pick up this work "in a couple months", but I've been saying that every few months now and it hasn't happened. Priorities... I wouldn't expect any effort on it in the short term, and if the effort does get picked up again, I think it would be starting from scratch to first get some agreement on goals and direction. I don't think the project has agreement on that, just a few personal opinions.
I have also said few weeks back that I’ll take a look and propose some design, but didnt get time. I’m planning on writing something down next week, Hopefully I can get some thoughts and feedback from the community. If others have an idea or design, I’ll love to jump in and contribute as well.
Thanks a lot for clarifying Travis, and thanks for picking this up Vignesh! Just to clarify, there is nothing urgent on my end around this CLI - I was motivated to poke around and fix the build simply because my editor (VS Code) kept complaining about build failures while I was editing JSON files 😆 But of course it's nice to keep enhancing the project.
Is there a difference between the QEMU and the raw build targets in image-builder? I'm a bit confused by the existence of both since both targets seem to be using the qemu Packer builder type.
Took a look and /lgtm'd. I can approve if needed, too. Thanks for pinging me, as I had missed that folks were looking for my input.
thanks Travis, would be great if you approve, we are in need of it to continue a few downstream testing
Yeah, let me poke some people about it. I had thought this was related to Windows (my mind saw PowerShell), so was expecting windows reviewers to chime in. I was mistaken there.
n/m, this one is related to a different cloud platform for the ppc64le architecture.
@codenrhoden wondering if you got a chance to poke anyone to review this PR?
Ack. Sorry, I have been swamped and distracted by sick kids lately. @kiran keshavamurthy, if you have any bandwidth, can you take a look at this one? It's on my plate as well.
thanks. one of them is home from school with me, so lots of constant distractions!
Gentle reminder - @kiran keshavamurthy I have fixed the review comment, can you please review the PR now?
@kiran keshavamurthy ci is green now ptal.. @codenrhoden see if you can also ack..
Hello all, I have a doubt: I'm curious why the open-vm-tools package installation is not a step in the Centos 8 OVA kickstart config. This causes Centos/RHEL 8 OVA builds to be stuck in Waiting for IP state, which then times out after 30 minutes. The VMWare console also shows that Open VMWare tools is not installed in this virtual machine, which corroborates the absence of the installation step. The installation is there in the Centos 7 ks.cfg. Is there a way to get around this?
I think the centos-8/RHEL-8 support was added for qemu builders only and not OVA. So it has not been validated. We’d be happy to review a PR to add RHEL-8/Centos-8 support for OVAs.
That is correct. In our CI workflow, we added the following patch for it that adds the rhel-8.json to the OVA packer configs, and it uses the centos/http/8/ks.cfg
https://github.com/aws/eks-anywhere-build-tooling/blob/main/projects/kubernetes-si[…]ge-builder/patches/0009-Add-support-for-RHEL-8-OVA-builds.patch
When trying to build the Centos/RHEL 8 OVA, I get the error Kickstart file /run/install/ks.cfg is missing. I have tried serving the kickstart config checked in to the repo through both httpdirectory and floppydirs options with appropriate bootmediapath, but the build is not able to find the ks cfg file.
I also get the error Module floppy not found in directory /lib/modules/
@kiran keshavamurthy Can you take a look at this PR adding support for RHEL 8 OVA builds, including CI?
Addressed comments here. Changed the structure of goss-vars a little to accommodate for different packages for different versions of the same OS. Could you take another look @kiran keshavamurthy
Hi folks! Can someone please confirm if there’s a way to set environment variables for, let’s say, directly setting the values of fields mentioned in images/capi/packer/config/kubernetes.json to build the images instead of using PACKERVARFILES? Or if there’s a workaround to achieve the same instead of having to manually edit the config file every time an image has to be built for a new k8s release?
Any help or clues would be really appreciated, thanks!
if you want to change just a few variables you can do something like PACKERFLAGS="-var=KUBERNETESVERSION=1.22.8 -var=ANOTHER_VAR=value" make build-target
And I haven't tried building Windows, so not sure about this one either. 😕
I have a pipeline which builds a CAPI vSphere image using image-builder and it seems to be erroring but not giving details. Is there a way to look at why it’s happening or get logs from somewhere?
==> vsphere: Typing boot command...
==> vsphere: error typing a boot command (code, down)28, false: ServerFaultCode: A general system error occurred: Invalid fault
==> vsphere: trying key input again
==> vsphere: Error running boot command: error typing a boot command (code, down)28, false: ServerFaultCode: A general system error occurred: Invalid fault
==> vsphere: Clear boot order...
==> vsphere: Power off VM...
==> vsphere: Deleting Floppy image ...
==> vsphere: Destroying VM...
Build 'vsphere' errored after 48 seconds 532 milliseconds: Error running boot command: error typing a boot command (code, down)28, false: ServerFaultCode: A general system error occurred: Invalid fault
I was able to get it working locally and not in the pipeline, which I thought was odd. But it works.
hey folks, what is the safer way to add custom Ansible roles in the image-builder? Want to run a few tasks that are not in-tree
yea, I'm using volume mount from the host with custom ansible rules, just wondering other options
Hey folks! I’d like to know if there’s a way to customise the "ami_name" property in the packer config using the packer flag.
That feel when image-builder hangs waiting for the instance to come up, but the debug SSH key that it outputs works and connects to a running instance. feelsbadman
No agenda items for office hours today, so I marked it as canceled.
As a heads up, I plan to tag the repo today. There's a few things pending that will add some new capabilities, and it would be good to tag right before.
Hi folks! Would love to get some eyes on since there are multiple pending PRs blocked on it.
TL;DR: This PR commits the Ignition files used by Flatcar builds to the image-builder repo so that they don't have to be consumed from an external repo.
Thanks!
no agenda items today, marking office hours as canceled (I was going to say I am not able to make it anways)
I know there are several open PRs that I've been pinged on. I'm blocking off some time this afternoon to play catch up on reviews.
Hi aniruddha, are you using a very recent ssh client version? Check this comment and apply the same fix to gce's ANSIBLESSHARGS value and see if that helps
@Jeremi Piotrowski Thanks 😄 it worked:handwithindexandmiddlefingerscrossed:
This could be the exact same thing as just one message above yours (assuming CAPG -> GCP)
I'm not sure it is. I did look at that but that doesn't seem to have any issues with SSH timing out (nor am I using the qemu variants)
my bad, i thought i saw this resulting in ssh timeouts in the past as well (after multiple retries). it doesn't only affect qemu variants
I guess that best way is to connect to vnc endpoint which packer spawns on every run of image-builder. You could see password and port inside shell output.
So, it looks like it was something related to my local network (not sure what exactly) but once I was on our corp VPN is worked first time. 🤷
Can I get please get couple of sponsors to join kubernetes-sigs org? I have contributed to image-builder and also planning to move our CAPI Provider repo for OCI to kubernetes-sigs org.
PRs
Thanks @cecile, i have got 2 sponsors, will ping you in case I need further help
hmm, ive ran into strange issues with ubuntu and qemu :S
Seems like the audit service isnt installed by default anymore on qemu systems. If i use the same iso and kickstart on an vCenter i still get the package
qemu: TASK [node : Ensure auditd is running and comes on at reboot] **
241 qemu: fatal: [default]: FAILED! => {"changed": false, "msg": "Could not find the requested service auditd: host"}
looks like a regression from https://github.com/kubernetes-sigs/image-builder/commit/bc309118a3fe9db9c9d053e8d72d9fad7c43f1fa#diff-d74534cba8de8668a56[…]225ef8dae206955fffbf3135
as that also applies to the raw builder
Hi folks. should be good to merge now. Testing is easy: make build-qemu-flatcar
I'd love to get this merged as soon as we can since there is a bunch of pending CAPI work that's blocked on this PR.
Thanks! 🙏
Hi folks, I am trying to build photon OVA for vSphere, and trying to add a custom OVF property to the OVA, but defining the property in the json and setting that json file as env variable OVFCUSTOMPROPERTIES is not appending the desired property to OVF, could you please correct me if I am doing something wrong?
I wanted to this property to be added to Cluster API Provider(CAPI) category
@swan are you also setting IB_OVFTOOL=1 to make sure to use ovftool to build the OVA.
Does anyone happen to know the minimum required IAM permissions needed by image-builder when building gcp GCP VM images? The docs say to give editor role but this is such large scope of unneeded permissions.
I think it's just the following that's needed but would be great if anyone else can confirm.
compute.disks.create
compute.disks.delete
compute.disks.useReadOnly
compute.globalOperations.get
compute.images.create
compute.images.get
compute.images.getFromFamily
compute.images.getIamPolicy
compute.images.list
compute.instances.create
compute.instances.delete
compute.instances.get
compute.instances.getSerialPortOutput
compute.instances.setLabels
compute.instances.setMetadata
compute.instances.setServiceAccount
compute.machineTypes.get
compute.subnetworks.use
compute.subnetworks.useExternalIp
compute.zoneOperations.get
compute.zones.get
iam.serviceAccounts.actAs
If anyone is interested, I managed to confirm these permissions are enough today 🙂
Building the image build-qemu-centos-7 takes around ~30 mins. Does anyone have any thoughts on how to make it faster? Is there a way we can have incremental image layers so that creating/testing images can be faster?
For vSphere OVAs, we build base images and re-use it (clone builder). This decreases the build times. Not sure if something similar exists for qemu
is ready to merge for a while and solves issues with building images locally with newer OpenSSH versions. I'm not sure who I can assign for that. Can someone please have a look?
hey @kopiczko, i've dropped a comment which i think would allow the CI to pass and allow this PR to get merged
my PR fails with FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/pip3.7' - anyone know why this could happen?
that is a expected as its trying to isntall PIP I think, The vhds build and the 2019 image is failing with:
Builds finished but no artifacts were created.
panic: runtime error: invalid memory address or nil pointer dereference
2022/08/12 07:50:00 packer-builder-azure-arm plugin: [signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x1a933ce]
2022/08/12 07:50:00 packer-builder-azure-arm plugin:
2022/08/12 07:50:00 packer-builder-azure-arm plugin: goroutine 226 [running]:
2022/08/12 07:50:00 packer-builder-azure-arm plugin: github.com/Azure/go-autorest/autorest/azure.(Future).WaitForCompletionRef(0xc0001f8cc0, 0x546d9c0, 0xc0006ba000, 0x53f6420, 0xc0000b98b0, 0x540e140, 0xc0001f4a50, 0xc000f08120, 0xc000cd2140, 0xdf8475800, ...)
2022/08/12 07:50:00 packer-builder-azure-arm plugin: /home/circleci/project/packer/vendor/github.com/Azure/go-autorest/autorest/azure/async.go:174 +0x54e
2022/08/12 07:50:00 packer-builder-azure-arm plugin: github.com/hashicorp/packer/builder/azure/arm.(StepCaptureImage).captureImageFromVM(0xc000562e80, 0x546d9c0, 0xc0006ba000, 0xc0006e7b18, 0x18, 0xc0001a3120, 0x1c, 0xc0005626c0, 0xc000e0e3c0, 0x29)
sig-windows-2019: FAILED. See logs in the artifacts folder.
sig-centos-7-gen2: SUCCESS
sig-ubuntu-2004-gen2: SUCCESS
sig-windows-2022-containerd: SUCCESS
sig-centos-7: SUCCESS
sig-flatcar: SUCCESS
sig-ubuntu-1804: SUCCESS
sig-flatcar-gen2: SUCCESS
sig-ubuntu-1804-gen2: SUCCESS
sig-windows-2019-containerd: SUCCESS
sig-ubuntu-2004: SUCCESS
then you can naviagte to artifcats->azure-sigs and open sig-windows-2019.log
ah, thanks for the steps to debug - I didn't see it until now (and figured it out meanwhile myself) 😅
wondering what I could do about it - sounds like an issue with the azure packer plugin. Not sure if a retry would work?
looks like an issue in the scripts not naming the image properly. retrying the test for now is ok, looks like we got an issue open to look into it more
Morning y'all 👋 Looks like image-builder fails to build the new v1.25.0 release due to the change of default image registry (fails to pull coredns, likely more).
I've opened a PR to change the default registry to the new host:
I faced the same issue, we might need a release for the image-builder with this fix.
Yeah. That'd be awesome.
I think it can be worked around by providing that registry value in an override vars file but having it work "out of the box" would be nice 🙂
I haven't tried it though. But pretty sure all those vars can be overridden with a provided --vars-file
Nice! I'll add that as a note to the PR too 🙂
I guess you could just comment out the
- include_role:in node.yml
name: containerd
Hey folks, can someone chime in on whether the upstream CI is building 1.24 OVAs right now?
@Kubernetes Moderator Service has joined the channel
@Kubernetes Moderator Service has joined the channel
I have create an issue for failing image builds (while installing iptables-persistent package). Also have submited PR for fixing the same.
Just noticed, @jsturtevant has accepted for test... Thank you
Hi all, I’ve been observing a rather weird issue with vSphere image-builds lately in my environment. If the builder vm path I provide points a sub directory that is more than 1 level deep, rather than a top level directory - ex. capv/vignesh vs capv in the vsphere.json, the builder vm doesn’t seem to accept/take in the preseed cfg/kickstart file properly and loads into the GUI install prompt. Has anyone observed the same? Or any idea if there’s any folder level setting that I can take a look at? My vpshere is 7.0 and Im running the latest commit from image-builder.
The latest Kubernetes releases - 1.22.14, 1.25.1 - include a version of kubelet that now expects kubernetes-cni@v1.1.1 .
image-builder has it pinned to 0.8.7 causing (Ubuntu) builds to fail without overriding the kubernetescni* vars.
Is there a reason why we pin the version of kubernetes-cni package? Can we remove the version and instead rely on the dependency tree to install the needed version?
The vars could remain for those that need to pin to specific version but the default version can be set to * to allow ansible to install whichever version suits.
i am facing same issue, I even tried overriding parameters, but that didn’t work.
I was able to get it to work by adding "kubernetescnidebversion": "**" to my PACKERVAR_FILES json to get it to build. Was that what you tried? Did that not work?
For my test focusing on just ubuntu I only needed to overwrite the one var. ideally all should need updated to match the latest version available I guess.
I will give it a try, but eventually this has to be handled in image builder itself, isn’t it?
Yeah 100%. I'm just not sure how best to handle it as it could be a breaking change for some.
I would think that's enough, yes. Though I wasn't able to find out the backwards compatibility for the latest CNI. Not sure how many Kubernetes versions bank would still work as expected.
Also, I tried setting all CNI vars and it worked for Ubuntu, Flatcar and amazon OS, but for centOS its giving below error.
Prevalidating AMI Name: capa-ami-centos-7-1.22.14-00-1663588776Anything changed for centOS too?
==> centos-7: No AMI was found matching filters: {
==> centos-7: Filters: [
==> centos-7: {
==> centos-7: Name: "root-device-type",
==> centos-7: Values: ["ebs"]
==> centos-7: },
==> centos-7: {
==> centos-7: Name: "virtualization-type",
==> centos-7: Values: ["hvm"]
==> centos-7: },
==> centos-7: {
==> centos-7: Name: "architecture",
==> centos-7: Values: ["x86_64"]
==> centos-7: },
==> centos-7: {
==> centos-7: Name: "name",
==> centos-7: Values: ["CentOS Linux 7 x86_64 HVM EBS ENA**"]
==> centos-7: }
==> centos-7: ],
==> centos-7: Owners: ["410186602215"]
==> centos-7: }
We don't work with CentOS images so couldn't say. 😞 Only have experience building Ubuntu and Flatcar.
Are you able to find that image if you search manually in the AWS console?
no that image is not there only for centOS and the recently released kubernetes versions
Prevalidating AMI Name: capa-ami-centos-7-1.22.14-00-1663588776Are you trying to build that image or make use of it? (Are you responsible for building the CAPA-provided AMIs?)
That makes more sense 🙂
I assume the other Kube versions use the same CentOS base image. Did those also have issues finding the AMI?
Ah, maybe the upstream CentOS AMI name has changed? Let me see if I can find it myself.
I can't seem to find anything from the 410186602215 account. Any idea what account that is? Doesn't look like it shares any AMIs
oh now I get it, I think you wont be able to find it, coz VMware uses internal AWS account to host these CAPA specific images
Yeah, looks like the CentOS ownerID to use for public images is 125523088429
I am wondering what might have changed, this worked for last k8s releases 🤔
If that's an internal VMWare account, wouldn't that mean no one else would be able to build a CentOS based image?
Maybe something in that account has changed? AMIs incorrectly deleted / set to private?
If it is account specific I will do some more digging, thanks for the help @Marcus Noble 🙂
No worries 🙂 If I get time I'll try and add CentOS to our pipeline to see if I can get it working or not but sounds like it might be an issue elsewhere.
I can file a PR for this if no objection.
cc @kiran keshavamurthy @codenrhoden
You can provide the value as part of the —var-file
Oh sorry I think you're right. That var isn’t exposed. 😞 yeah, release needed
Thinking about it, might be worth updating that PR to use a user variable so it can be changed on the future
thats a good suggestion, we use these owner IDs as AMI filters in other OSes as well, I think there also we need the same update.
Oh nice! You should be able to override it then
Nice! Still worth getting a new release so it's fixed for everyone though.
Agreed!!! @codenrhoden @kiran keshavamurthy could a new release be taken care of?
we also need fix for cni versions used, image builder has hardcoded it to 0.8.x. A todo item before releasing new version.
Looks like its fixed in main branch
Bumping this thread again, could we get a newer release with newer changes as there are many fixes which could benefit the further image build for newer k8s releases
The tag isn’t enough anyway. The container image also needs building and publishing.
@kiran keshavamurthy @codenrhoden @jsturtevant could you please help with the new release?
Looks like the 1.25.x builds are now failing due to the change of image registry
failed to pull image "k8s.gcr.io/coredns:v1.9.3"Any chance we can get a new release? This fix is already in the main branch, just needs releasing 🙂
@kiran keshavamurthy are you able to help with releases? I haven't done one before
I’ve not as well.
Hello @codenrhoden, can you pls help us out here.
I believe Travis is the only person with the permission to cut a release. We should get a couple more folks added to that list. @jsturtevant if you are ok, I can open up a PR to get the 2 of added. Maybe @cecile too?
I also haven’t done an image-builder release before, do we have docs on the process?
I don’t think so. Hopefully @codenrhoden is able to help out this time.
Hey folks. Definitely happy to help out however I can. I believe this is the location that would control being able to tag the repo: https://github.com/kubernetes/org/blob/44cf4faf10760dcc023dc4220b5e1a61875a93e1/config/kubernetes-sigs/sig-cluster-lifecycle/teams.yaml#L285
And if you need me to tag, I can do that too. Just confirm the right commit - I haven’t been looking at the repo lately (as I’m sure you’re aware) 😆.
Hey Travis 👋
Thanks a lot.
I think we can tag it with the latest commit 4b97ae8b85216ac9e5f187fe88a2097e7813e525 unless someone has any objections.
I pushed a tag for v0.1.13. That should kick off the staging container build as well. Hope it all works out!
Thanks Travis, Can you pls document the steps somewhere so that we have it for future.
@kiran keshavamurthy next steps would be to test out the staging builds and if all looks good, promote the image to prod. Sanika definitely knows how to do all that. You can make sure that the staging build worked correctly in test grid.
Sorry, I definitely thought people were aware of those next steps. Regardless, I will write it up. Should I just drop it in an issue, add it to the official docs, or put it in the repo wiki (which I don’t think we use)?
Hey @codenrhoden 👋
I was able to build AMI and Azure image with the staging container. Testing OVA build atm.
Here's the PR for promoting 0.1.13 image -
cc - @kiran keshavamurthy
Cool! looks like it is already approved. dancing-penguin2
Hi I have put in place a PR to pass console to kernel params to debian based images , how do I regenerate the image ?
what are the default credentials to access a node through ssh?
Can you explain a bit more? which node? The node created during image building process?
It varies depending on which provider you're building for (some don't set at all). For AWS see
I dont think the ssh keys specified there are stored in the image so that when the an instance is created using the image, you can use the same ssh key. I may be wrong though. The ssh key is only valid for the instance which was used to create the image.
I tried to look in here (QEMU image) and I see some username and password but they don't work
Oh, do you mean the image when finished and not during the building?
I think you'd want to handle that during the userdata stage, not during the building. I might be wrong though.
basically I'm trying to debug cloud-init errors... masters are provisioned on the infrastructure but fail at some point during initialization
oh, that's not idea. I'm not sure then unfortunately. 😞
Is anyone using image-builder with ubuntu 22.04? Any issues? I'm looking to update if it's nice and quick.
So, looks like it's not possible to use image-builder in its current state with Ubuntu 22.04. Looks like a newer version of openssh comes by default that no longer allows for ssh-rsa keys to be used.
Does anyone know what the process would be for updating the SSH key used by Ansible?
Doesn't look like I can override it so a new release would be required I think.
☝️ Ok, I think I might have made progress on this. It looks like updating the version of Ansible (and goss, not sure why) has allowed me to get further. The new Ansible version isn't a major version bump so I assume it'll be ok to get a PR up for a new release.
Once I've confirmed the build finishes as expected I will look at getting a PR put up.
Hi @Marcus Noble just wanted to say thanks for that ... noticed the old ansible version recently also but I was too lazy to do anything about it. And yes ubuntu 22 thank you for that also. Cc @kiran keshavamurthy
I'm just testing out ubuntu 22.04 builds for CAPA and CAPG. If all goes well I'll have another PR up with build tasks for those.
It is great to see all the work on adding Ubuntu 22.04 support 🎉
We are using these images with metal3 any work ongoing for the Ubuntu 22.04 for the make target build-qemu-ubuntu-2004-efi so it becomes build-qemu-ubuntu-2204-efi ?
I tried looking for it in the issues list but was not able to
Hey @knfoo I've only been focussing on AWS and GCP for my changes as it's what we need currently but we'll also be needing the qemu targets soon for use with OpenStack.
I'm hoping the changes are basically the same but haven't tried it out yet.
Hey @Marcus Noble
This is great - i will try it out after my vacation.
I was attempting to build a windows image today and I’m getting the following error:
fatal: [default]: FAILED! => {"attempts": 5, "changed": false, "dest": "c:\k\nssm.exe", "elapsed": 0.21601959999999998, "msg": "Error downloading '' to 'c:\k\nssm.exe': The remote server returned an error: (404) Not Found.", "status_code": 404, "url": ""}When I attempt to go to I see the 404. It seems the URL is set here hmmm seems like that url is bad. I downloaded nssm.exe and placed it in another public place and the image build worked
we are aware of this, you have found the mitigation. We are working on addressing it
Hey folks.
I’m trying to understand how passwords are set for winRM for CAPI provider images. I see the OVA and VBOX are hardcoding winrm_password in their windows packer config, but AWS and Azure aren’t. How are these later images setting the passwords at image build time?
Hello all, whats the current topic request format for office hours? If there is a doc, can someone point me to it? This one seems to be old. Or do we just post here the day before?
Hey all, I’m not sure what I’ve done to have the pull-ova-all tests failing in my pr
but I’m seeing the following
'packer' has been installed to /home/prow/go/src/sigs.k8s.io/image-builder/images/capi/.local/bin, make sure this directory is in your $PATHTIA for any help simple_smile
hack/ensure-goss.sh
/root/.packer.d/plugins/packer-provisioner-goss: OK
hack/ensure-ovftool.sh
rockylinux-8: FAILED. See logs in the artifacts folder.
ubuntu-2004: FAILED. See logs in the artifacts folder.
photon-3: FAILED. See logs in the artifacts folder.
flatcar: FAILED. See logs in the artifacts folder.
ubuntu-1804: FAILED. See logs in the artifacts folder.
centos-7: FAILED. See logs in the artifacts folder.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:-:-- --:-:-- --:-:-- 0
0 0 0 0 0 0 0 0 --:-:-- --:-:-- --:-:-- 0
1 7400k 1 111k 0 0 204k 0 0:00:36 --:-:-- 0:00:36 204k
100 7400k 100 7400k 0 0 7815k 0 --:-:-- --:-:-- --:-:-- 17.7M
govc: Post "": dial tcp 10.2.224.4:443: i/o timeout
looks like maybe the centos-7 artifact failed in the pull-azure-sigs test https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_image-bui[…]igs/1584616530543382528/artifacts/azure-sigs/centos-7.log as well
@kiran keshavamurthy Reg: PR 1003, the new ubuntu versions use cloud init to boot and it expects the drive to be labeled as cidata to work. But for some reason setting floppylabel to the value does not seem to take effect. As a result I see only loading the kickstart files (meta-data and user-data) as files on cdrom(able to set the cdlabel tag and gets reflected). However for this to work either xorriso or mkisofs needs to be available to create the cdrom which then can be mounted. Adding either of those will need to change prow job container running the build. So need couple of inputs here
Thanks for the patience @Sriraman Srinivasan. Been a bit busy and was on PTO. Will look into this soon.
Hey Folks, wanted to know if there is something changed recently from v0.1.13 regarding AMI generation? As the names that are now generated for CAPA AMIs by image builder are as capa-ami-amazon-2-v1.25.3-1665727783 (notice suffix v in k8s release) as opposed to what it was in previous releases capa-ami-amazon-2-1.25.2-00-1664536077 . This is causing issues in CAPA for pushing AMI images, as there is some strict checking of formatting. We could resolve this on CAPA side, but just wanted to make sure if this change was intentional or was done by mistake?
I'm pretty sure it was intentional to be inline with other providers. I recently looked up the change. Give me a min and I'll dig it out
can we use existing OVA image (already existing one which is verified by security team) instead of iso and install addons (k8s,cni,container stuff to make it capi v conformant image) on top of that OVA using image builder and create a vm template out of it.Can some one guide me on this
Do the image-builder office hours still happen? Doesn't look like there's been one since July according to the meeting notes.
I believe if people don’t put things on the agenda they will cancel the meeting
If no Agenda items are present the night before, the meeting will be canceled.
I haven't seen any messages about it being canceled either though. Which makes me wonder if it's just kinda died off.
yeah I’m not sure there. Last thing about office hours was from
I might try adding topics for next week and see what happens I guess 😆
@mboersma @jsturtevant @kiran keshavamurthy and I have been maintaining the project on a best effort basis (basically just keeping the lights on)
I personally have a conflict with the current 8am time and can’t make it most weeks
the project is a bit low on maintainers right now so if anyone has interest in stepping up to host the office hours please do
That's actually one of the things I wanted to discuss 🙂 image-builder is a pretty important project for us at Giant Swarm and would love if we could help out . There's a lot of improvements / fixes we'd like to see but the current level of activity on the project makes that difficult. I'll get some topics added to the agenda for next week later today 🙂
We are in a similar position as well, image builder is pretty important to us as well. We also want to make improvements/changes.
Image-builder is important to us as well. I have to split time between upstream and downstream work so sometimes upstream work takes a hit.
I’m completely on board with changing the office hours to better suit everyone and love to get more maintainers
Just wanted to double check others are joining the office hours. I'm currently waiting in the call. 🙂
@kiran keshavamurthy @richcase @cecile Are any of you planning to join?
I personally have not had the bandwidth to be active enough as image-builder maintainer lately and I think it's time I officially step down to make space for new folks
I will make a PR soon to officially move myself to emeritus. I also nominate @mboersma as new maintainer, he's been a lot more involved in the project than I have lately and is interested in helping. cc @kiran keshavamurthy
I can no longer make the 8am meetings on Thursdays but would be open to meeting another time. Should we get a doodle poll going for new time as it seems we have a bunch on new folks?
Thanks @cecile 🙂 I've opened a PR to add myself to the reviewers:
Hey folks, since the newer k8s release versions are compatible with containerd version 1.6.4+, image builder still uses default as 1.6.2.
I tried below giving parameters such that we use compatible version of containerd while generating CAPA AMIs:
"containerd_version": "1.6.5",but its failing with below error:
"containerd_checksum": "cf02a2da998bfcf61727c65ede6f53e89052a68190563a1799a7298b0cea86b4",
"containerd_url": ""
fatal: [default]: FAILED! => {"changed": true, "checksum_dest": null, "checksum_src": "8b354c7fcc59c66ce8ade0bc137782838709fa3c", "dest": "/tmp/containerd.tar.gz", "elapsed": 0, "msg": "The checksum for /tmp/containerd.tar.gz did not match 91f1087d556ecfb1f148743c8ee78213cd19e07c22787dae07fe6b9314bec121; it was cf02a2da998bfcf61727c65ede6f53e89052a68190563a1799a7298b0cea86b4.", "src": "/tmp/.ansible/ansible-tmp-1668598896.3124301-85962-228207629295306/tmpps0mndg2", "url": ""}Although I see checksum set is correct but i think somehow this is interfering with other attribute related to CRI.
although at some point, we must update this version to higher containerd version compatible with recent k8s versions
also even after setting right paramter I am getting below error
ubuntu-20.04: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to create temporary directory. In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \"echo /tmp/.ansible\"&& mkdir \"echo /tmp/.ansible/ansible-tmp-1668600507.193056-96724-263289874289379\" && echo ansible-tmp-1668600507.193056-96724-263289874289379=\"echo /tmp/.ansible/ansible-tmp-1668600507.193056-96724-263289874289379\" ), exited with result 1", "unreachable": true}could someone help solve this
Sorry, I was AFK yesterday. 🙂 Do you get the same error without making the changes to containerd? I doesn't look like it should be related from what I can tell. I can try and dig in a bit more later today and see if I can see the cause.
I wanted to change containerd version thats why trying this, but didnt get this error while not specifying containerd versions etc.
Was the 3 containerd values the only thing you changed?
{
"kubernetes_series": "1.25",
"kubernetes_semver": "1.25.4",
"kubernetes_rpm_version": "1.25.4-0",
"kubernetes_deb_version": "1.25.4-00",
"kubernetes_source_type": "pkg",
"kubernetes_http_source": "",
"kubernetes_rpm_repo": "",
"kubernetes_rpm_gpg_key": "\" \"",
"kubernetes_rpm_gpg_check": "True",
"kubernetes_deb_repo": "\" kubernetes-xenial\"",
"kubernetes_deb_gpg_key": "",
"kubernetes_container_registry": "registry.k8s.io",
"kubernetes_load_additional_imgs": "false",
"kubeadm_template": "etc/kubeadm.yml",
"containerd_version": "1.6.6",
"containerd_sha256": "0212869675742081d70600a1afc6cea4388435cc52bf5dc21f4efdcb9a92d2ef",
"containerd_url": ""
}This is the env vars i generally use to generate AMIs, only thing added here is containerd stuff
ok made some progress, looks like i was using wrong containerd url, so removed url and it started working but failed at this point
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible.errors.AnsibleFilterError: Version comparison failed: '<' not supported between instances of 'int' and 'str'I am not sure if this version is supported or not, ideally i should be able to build with any containerd version
flatcar-stable: fatal: [default]: FAILED! => {"changed": false, "msg": "AnsibleFilterError: Version comparison failed: '<' not supported between instances of 'int' and 'str'"}
Any idea what task is throwing that error? Struggling to find where the version is checked 😕
sorry missed it 😄
TASK [containerd : Copy in containerd config file etc/containerd/config.toml] **
😕 That doesn't seem to do anything with the version.
Mind posting the ~5 tasks that completed in the run up to that error?
I think it might be complaining about a different "version" in that error. As far as I can see, the containerd_version is used for 2 things - building the URL (which you're overwriting anyway) and by goos to check the right version was installed.
here you go
flatcar-stable: TASK [include_role : containerd] *
flatcar-stable:
flatcar-stable: TASK [containerd : download containerd] *
flatcar-stable: changed: [default]
flatcar-stable:
flatcar-stable: TASK [containerd : Create a directory if it does not exist] *
flatcar-stable: ok: [default]
flatcar-stable:
flatcar-stable: TASK [containerd : unpack containerd for Flatcar to /opt/bin] *
flatcar-stable: changed: [default]
flatcar-stable:
flatcar-stable: TASK [containerd : delete /opt/cni directory]
flatcar-stable: changed: [default]
flatcar-stable:
flatcar-stable: TASK [containerd : delete /etc/cni directory] *
flatcar-stable: changed: [default]
flatcar-stable:
flatcar-stable: TASK [containerd : Creates unit file directory] *
flatcar-stable: changed: [default]
flatcar-stable:
flatcar-stable: TASK [containerd : Create systemd unit drop-in file for containerd to run from /opt/bin] *
flatcar-stable: changed: [default]
flatcar-stable:
flatcar-stable: TASK [containerd : Create containerd memory pressure drop in file]
flatcar-stable: changed: [default]
flatcar-stable:
flatcar-stable: TASK [containerd : Create containerd max tasks drop in file] *
flatcar-stable: changed: [default]
flatcar-stable:
flatcar-stable: TASK [containerd : Create containerd http proxy conf file if needed] *
flatcar-stable: changed: [default]
flatcar-stable:
flatcar-stable: TASK [containerd : Creates containerd config directory] *
flatcar-stable: changed: [default]
flatcar-stable:
flatcar-stable: TASK [containerd : Copy in containerd config file etc/containerd/config.toml]
flatcar-stable: An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible.errors.AnsibleFilterError: Version comparison failed: '<' not supported between instances of 'int' and 'str'
flatcar-stable: fatal: [default]: FAILED! => {"changed": false, "msg": "AnsibleFilterError: Version comparison failed: '<' not supported between instances of 'int' and 'str'"}
flatcar-stable:
flatcar-stable: PLAY RECAP **
flatcar-stable: default : ok=34 changed=25 unreachable=0 failed=1 skipped=145 rescued=0 ignored=0
flatcar-stable:
ah ha! I think it's this line: https://github.com/kubernetes-sigs/image-builder/blob/02df45969409c7f18f2cf7e63b70[…]i/ansible/roles/containerd/templates/etc/containerd/config.toml
Can you try setting kubernetes_semver to be v1.25.4 (hopefully that doesn't break elsewhere)
oh atleast the previous failure is gone now 😅 thanks @Marcus Noble I will ping back if I get more errors 😉
can i run ansible scripts alone on the OVA from the image builder repo ? can some one guide on this.
I do not want image builder to create an ova for me ,i have an ova and i just want to install rest all things on top of that ova,so that i can use that ova to create vmtemplate
there isn't a guide but if you have the vm booted you can configure Ansible to connect to that VM then run the ansible scripts in . You will have to set all the ansible variables yourself
Hii everyone :)) 👋
I attended the last capi office hours and found out that this project needs some contributors who can help.
I am a beginner in this space but am looking to contribute wherever and whatever I can.
I would love to know more about the project and can help with beginner friendly issues if there are any 👀.
Any kind of guidance will be great :))
thanks
Hey folks, I am trying to build CAPA AMI images, but I am getting following ansible ssh error, could someone help around fixing this issue? This has been specifically started occurring recently as it used to work before. Maybe it has something to do with mac OS update, but I am not sure
amazon-2: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Unable to negotiate with 127.0.0.1 port 55506: no matching host key type found. Their offer: ssh-rsa", "unreachable": true}
there is a comment there on how to bypass it. The PRs to fix it have been open for a while AFAIK, so i just had to use what is mentioned in the comment to move forward. Hope it works for you
Hi team , need help on creating clusterapi image for kubevirt
Hello there 👋 Happy new year!
Is there any chance this issue makes sense? Would it make sense to define the end goal of it? I can try to work something out for this
this would avoid passing it to the image builder, but would still need it in scripts? I think it probably makes sense to minimize passing hte variables around. @mboersma any thoughts?
Hey, did already tried to run image-builder on a pre-existent image? More specifically decouple and reuse the ansible roles on a running VM.
Hey! With the latest minor releases of the 24, 25 and 26 series I'm running into build errors for GCP cluster-api images. 1.24.9, 1.25.5 and 1.26.0 work just fine, with 1.24.10, 1.25.6 and 1.26.1 I get the same error across all 3 builds:
ubuntu-2204: TASK [kubernetes : Install Kubernetes] **
ubuntu-2204: fatal: [default]: FAILED! => {"cache_update_time": 1674304180, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\" install 'kubelet=1.24.10-00' 'kubeadm=1.24.10-00' 'kubectl=1.24.10-00' 'kubernetes-cni=1.1.1-00'' failed: E: Unable to correct problems, you have held broken packages.\n", "rc": 100, "stderr": "E: Unable to correct problems, you have held broken packages.\n", "stderr_lines": ["E: Unable to correct problems, you have held broken packages."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nSome packages could not be installed. This may mean that you have\nrequested an impossible situation or if you are using the unstable\ndistribution that some required packages have not yet been created\nor been moved out of Incoming.\nThe following information may help to resolve the situation:\n\nThe following packages have unmet dependencies:\n kubeadm : Depends: kubernetes-cni (>= 1.2.0)\n kubelet : Depends: kubernetes-cni (>= 1.2.0)\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "Some packages could not be installed. This may mean that you have", "requested an impossible situation or if you are using the unstable", "distribution that some required packages have not yet been created", "or been moved out of Incoming.", "The following information may help to resolve the situation:", "", "The following packages have unmet dependencies:", " kubeadm : Depends: kubernetes-cni (>= 1.2.0)", " kubelet : Depends: kubernetes-cni (>= 1.2.0)"]}
{
...
"kubernetes_cni_deb_version": "1.1.1-00",
---
}I was able to build it with these values:
{
"kubernetes_cni_deb_version": "**",
"build_name": "ubuntu-2204",
"distribution_release": "jammy",
"distribution_version": "2204"
}This ensures the latest deb version is used from the package manager.
The latest nightly run is here: Prow Job
This runs this script: ci-gce-nightly.sh
I don't know much about it but it seems to use this for the 1.26 builds which is still targeting the 1.26.0 version - overwrite-1-26.json
We have run into a few times where users are getting errors because the IBVERSION ENV isn’t set when trying to build an image for our packer config. I see other configs using the same “ib_version”: “{{envIBVERSION}}“, but nothing in the book calls out setting this. How are others 1) setting this environment variable for the user or 2) informing the users to set it?
Hi Everyone,
I just started experimenti with image-builder to build our own CAPZ Flatcar images.
for capz the init-sig.sh script is run to create the image definition but the format is very hardcoded and not ideal ( at least for our use case )
the generated format is capi-flatcar-stable-$(FLATCARVERSION)-gen2 while ideally i would like, similar to what the CAPI images are capi-flatcar-stable-$(KUBERNETESVERSION) so i can then have an image inside of it for each version of flatcar
➜ az sig image-definition list-community --public-gallery-name flatcar4capi-742ef0cb-dcaa-4ecb-9cb0-bfd2e43dccc0 --location westeurope | jq '.[].name'
"flatcar-stable-amd64-capi-v1.23.13"
"flatcar-stable-amd64-capi-v1.24.6"
"flatcar-stable-amd64-capi-v1.24.9"
"flatcar-stable-amd64-capi-v1.25.4"
"flatcar-stable-amd64-capi-v1.25.6"
"flatcar-stable-amd64-capi-v1.26.0"
➜ az sig image-version list-community --location westeurope --public-gallery-name flatcar4capi-742ef0cb-dcaa-4ecb-9cb0-bfd2e43dccc0 --only-show-errors --gallery-image-definition flatcar-stable-amd64-capi-v1.24.9 -o table
ExcludeFromLatest Location Name PublishedDate UniqueId
------------------- ---------- -------- -------------------------------- --------------------------------------------------------------------------------------------------------------------------------
True westeurope 3374.2.1 2023-01-06T00:29:51.344093+00:00 /CommunityGalleries/flatcar4capi-742ef0cb-dcaa-4ecb-9cb0-bfd2e43dccc0/Images/flatcar-stable-amd64-capi-v1.24.9/Versions/3374.2.1
@cecile in case you know about it , i did not want to cross post in cluster-api-azure
🙏
hi @fc. what we do for Flatcar is we use image-builder to build the image, and then republish into a seperate gallery that has the desired structure that we want. The republishing is done by using the gallery-image-version-id as the image source
@Jeremi Piotrowski i have a follow up question to this 🙂
I am building directly with image-builder into a Community gallery and is working fine now but , in order to use my image, i need to specify something like
image:
computeGallery:
gallery: test-xxxx-820f-b52ca78f96e6
name: capi-flatcar-stable-1.24.9-gen2
plan:
offer: flatcar-container-linux-free
publisher: kinvolk
sku: stable-gen2
version: latest
or do we build the community gallery image with a community gallery image as a source?
my concern is that
Hello. Images in flatcar4capi are build from Flatcar VHDs imported into a SIG, so their advantage is that they don't require plan information. That's the big part of it.
There is also flatcar community gallery, which you can use as a source for building your images using image-builder. Let me dig up some sample JSON packer values we use for building the images.
yeah that make sense. thanks
There is also flatcar community gallery, which you can use as a source for building your images using image-builder. Let me dig up some sample JSON packer values we use for building the images.that is what i am doing 👍 through image-builder
From our release automation:
cat <{
"sig_image_version": "${FLATCAR_VERSION}",
"kubernetes_semver": "${KUBERNETES_SEMVER}",
"image_name": "${IMAGE_NAME}",
"image_offer": "",
"image_publisher": "",
"image_sku": "",
"image_version": "",
"plan_image_offer": "",
"plan_image_publisher": "",
"plan_image_sku": "",
"source_sig_subscription_id": "${AZURE_SUBSCRIPTION_ID}",
"source_sig_resource_group_name": "${STAGING_SIG_RESOURCE_GROUP}",
"source_sig_name": "${FLATCAR_STAGING_GALLERY_NAME}",
"source_sig_image_name": "${FLATCAR_IMAGE_NAME}",
"source_sig_image_version": "${FLATCAR_VERSION}"
}
EOF
now i am just trying to validate the differences between the nodes i get with flatcar4capi and from image-builder
built images.
I can see that with the image from flatcar4capi i get containerd 1.6.14 while on the one from image-builder i get 1.6.2 ... do you customize / override that ?
No, we build from master branch of image-builder. I don't remember, but maybe in Flatcar it's using baked in containerd version as opposed to one installed by Ansible? So maybe the difference is Flatcar version used?
thanks for confirming that.
i am going through the ansible code and our pipelines now , maybe we are just out of date or something.
i can see the playbook eventually run does indeed specify 1.6.2
sig-flatcar-gen2: Executing Ansible: ansible-playbook -e packer_build_name="sig-flatcar-gen2" -e packer_builder_type=azure-arm --ssh-extra-args '-o IdentitiesOnly=yes' --extra-vars containerd_url= containerd_sha256=91f1087d556ecfb1f148743c8ee78213cd19e07c22787dae07fe6b9314bec121 pause_image=k8s.gcr.io/pause:3.6 containerd_additional_settings= containerd_cri_socket=/var/run/containerd/containerd.sock containerd_version=1.6.2now i just need to walk back and find where and why those are set to 1.6.2 🙂
Cool. Feel free to ping me if you have some further questions. I was mainly driving the work on community SIG for Flatcar and I'm always happy to help (or redirect to team members which might be more knowledgeable than me) 🙂
hi @Mateusz Gozdek (invidian) sorry to ping you again, one quesiton
you said you build VHD using image-builder then import into SIG
i wanted to try the same but when i look at i don't see flatcar in the lsit of VHD supported targets.
do you have patch or something on top of image builder ?
if your code for the build pipeliens is available on github i'ld love to take a look 🙂
thanks 🙏
The script is not public yet, as it's a first version and we didn't put much effort into it yet, but yeah, eventually it will be public. This is how it looks: .
As you won't have access to storage account which holds Flatcar VHD images, You probably need to download a VHD from the Flatcar release, upload it to storage account, and then you should be able to use the script above.
It's nothing fancy really.
unrelated , to an extent , question.
do you know why the .4 of the stable channel does not exist in azure as a vm image yet ?
➜ az vm image list --publisher kinvolk --sku stable-gen2 --offer flatcar-container-linux-free --all -o json | jq -r '[.[].version] | sort_by( values | split(".") | map(tonumber) ) | .[-1]'
3374.2.3
I guess the image has not been approved yet, which is odd. Maybe @Kai Lüke or @Jeremi Piotrowski know what's the status of it?
BTW, is the image available in community gallery? I can make it so if it's also missing.
in the flatcar4capi i can only see .1 , let me see to find the name of the flatcar community gallery
in the flatcar one i see the same, .3 as latest
λ az sig image-version list-community --location westeurope --public-gallery-name flatcar-23485951-527a-48d6-9d11-6931ff0afc2e --only-show-errors --gallery-image-definition flatcar-stable-amd64 | jq '.[].name'
"3374.2.0"
"3374.2.1"
"3374.2.3"
nice thanks 🙏 i will need to see to change my pipeline to use the gallery rather than the vm image as source
Yeah azure publishing got delayed this time round and is taking longer as well
@Mateusz Gozdek (invidian) thanks for all your help, i switched to the community gallery "/CommunityGalleries/flatcar-23485951-527a-48d6-9d11-6931ff0afc2e/Images/flatcar-stable-amd64/Versions/3374.2.4"
just a question, how official is this gallery when compared to the flatcar Marketplace offer ?
It's official (), but you would probably be an early adopter. We hope it will enable Flatcar users on Azure to get faster access to latest image versions (because of easier and more automated release process) and without requirement of accepting plans.
Right now image publishing there is not wired to the CI, so there might be still a delay until the images show up, but when we see people using it, I'm sure CI will be prioritized to have the process fully automated.
nice thanks
without requirement of accepting plans.so using the Community Gallery as my source rather than the Martketplace offer means i don't have to accept terms in every subscription ? nice
We hope it will enable Flatcar users on Azure to get faster access to latest image versions👍 exactly why i switched
Right now image publishing there is not wired to the CI🙏 🤞
'm sure CI will be prioritized to have the process fully automated.
I'll make sure images are published as soon as I see a feed as well then. It's probably time to enable replication to all regions then as well.
Right now image publishing there is not wired to the CII do wonder though if ,when that time comes, i should stop building my own images ... i mean here https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2890/files?short_path=91b8a4f#diff-91b8a4f39cd0f7ee28f[…]67069103bd0fa528e84cb4d3 you do mention that those are just reference images ... but i am literally just rebuilding the same thing 🙂
Yes, CAPI images we publish are reference images for testing and CI use with no regular/security updates guarantees. This is the same for all CAPI images available from the maintainers. So users are recommended to build their own images with versions they need etc.
@Mateusz Gozdek (invidian) i feel really bad for asking for this ... but ... any chance you could push 3374.2.5 to flatcar community gallery ? 🙏
🙏 Thanks , very appreciated ... is the last bit i need to check my pipeline will pick it up tonight 🤞
Hi Everyone !
I have a question about releases on
the last release is 0.1.12 from May 2022, there was a tag 0.1.13 from Sep 2022 ( which is what i am using for my builds )
but since then a lot of changes have been pushed, including some that are a requirement for recent versions of kubernetes like ( )
Do you think is worth cutting a new release to also limit the amount of changes in between releases ?
Similarly, the latest container image for the image-builder is ancient as well, it is missing all the latest make targets (for instance no Ubuntu 22.04).
and, i did not check but i hope , that AZ cli will be more up-to-date in the master branch ( or i will do a PR for it ) since a bunch of commands fail with current version 🙂
Do you know what version is in the tarball that is linked in the CAPI Image-builder documentation? Is that then 0.1.12 or does it include all commits since?
no i don't know, but i don't use 0.1.12 i use the tag 0.1.13 ( which did not get a release though not sure why )
i know flatcar upstream uses master ...
Oh my, that could explain my struggles getting a working Ubuntu OVA. I'll have to check...
Yeah, the tarball is based on master including all commits (it uses github's tarball api endpoint)
@kiran keshavamurthy @mboersma Do you think we could get a new release cut sometime soon?
Yes, I can create a new tag today or tomorrow. Do we need any open PRs to get in before creating a new tag?
@fc I think you still have one open that would be good to go in don’t you?
Just need an approval https://github.com/kubernetes-sigs/image-builder/pull/1087
Yes , it got assigned and is a very simple one so hopefully it will make it 👍
@kiran keshavamurthy Is there docs or maybe we can pair for the release? I don't know that process
@jsturtevant As discussed I’ll update the docs on how to tag and release
I’ve created a tag for v0.1.14 today.
@Sanika Gawhane Can you create a PR to promote the container image from staging to prod please?
Hi everyone, did you ever see such error when make build-ami-flatcar flatcar ami? Any guidance for the checkings? I am using v0.1.13 branch. cc @swan
amazon-ebs.{{user build_name}}: TASK [kubernetes : unpack crictl] *
amazon-ebs.{{user build_name}}: changed: [default]
amazon-ebs.{{user build_name}}:
amazon-ebs.{{user build_name}}: TASK [kubernetes : Remove crictl tarball] *
amazon-ebs.{{user build_name}}: changed: [default]
amazon-ebs.{{user build_name}}:
amazon-ebs.{{user build_name}}: TASK [kubernetes : Create kubelet default config file] *
amazon-ebs.{{user build_name}}: changed: [default]
amazon-ebs.{{user build_name}}:
amazon-ebs.{{user build_name}}: TASK [kubernetes : Enable kubelet service] *
amazon-ebs.{{user build_name}}: fatal: [default]: FAILED! => {"changed": false, "msg": "Could not find the requested service kubelet: host"}
amazon-ebs.{{user build_name}}:
amazon-ebs.{{user build_name}}: PLAY RECAP **
amazon-ebs.{{user build_name}}: default : ok=45 changed=34 unreachable=0 failed=1 skipped=166 rescued=0 ignored=0
amazon-ebs.{{user build_name}}:
having trouble running AMI build right now, do you have a full log?
I tried on master or v0.1.14, it is even worse with "ansible-playbook: error: argument --scp-extra-args: expected one argument":
no_proxy=** make build-ami-flatcar
hack/ensure-ansible.sh
Starting galaxy collection install process
....
==> amazon-ebs.{{userbuild_name}}: Executing Ansible: ansible-playbook -e packer_build_name="flatcar-stable" -e packer_builder_type=amazon-ebs --ssh-extra-args '-o IdentitiesOnly=yes' --extra-vars containerd_url= containerd_sha256=8e227caa318faa136e4387ffd6f96baeaad5582d176202fe9da69cde87036033 pause_image=registry.k8s.io/pause:3.9 containerd_additional_settings= containerd_cri_socket=/var/run/containerd/containerd.sock containerd_version=1.6.8 containerd_wasm_shims_url= containerd_wasm_shims_version=v0.3.3 containerd_wasm_shims_sha256=da84b1c065a58f95a841d39e143cd7115d43e6faedcce7a8782f2942388260d7 containerd_wasm_shims_runtimes="" crictl_url= crictl_sha256= crictl_source_type=http custom_role_names="" firstboot_custom_roles_pre="" firstboot_custom_roles_post="" node_custom_roles_pre="" node_custom_roles_post="" disable_public_repos=false extra_debs="" extra_repos="" extra_rpms="" http_proxy= https_proxy= kubeadm_template=etc/kubeadm.yml kubernetes_cni_http_source= kubernetes_cni_http_checksum=sha256: kubernetes_http_source= kubernetes_container_registry=registry.k8s.io kubernetes_rpm_repo= kubernetes_rpm_gpg_key=" " kubernetes_rpm_gpg_check=True kubernetes_deb_repo=" kubernetes-xenial" kubernetes_deb_gpg_key= kubernetes_cni_deb_version=1.2.0-00 kubernetes_cni_rpm_version=1.2.0-0 kubernetes_cni_semver=v1.2.0 kubernetes_cni_source_type=http kubernetes_semver=v1.26.2 kubernetes_source_type=pkg kubernetes_load_additional_imgs=false kubernetes_deb_version=1.26.2-00 kubernetes_rpm_version=1.26.2-0 no_proxy= pip_conf_file= python_path=/opt/bin/builder-env/site-packages redhat_epel_rpm= epel_rpm_gpg_key= reenable_public_repos=true remove_extra_repos=false systemd_prefix=/etc/systemd sysusr_prefix=/opt sysusrlocal_prefix=/opt load_additional_components=false additional_registry_images=false additional_registry_images_list= additional_url_images=false additional_url_images_list= additional_executables=false additional_executables_list= additional_executables_destination_path= build_target=virt amazon_ssm_agent_rpm= --extra-vars ansible_python_interpreter=/opt/bin/python --extra-vars --scp-extra-args -O -e ansible_ssh_private_key_file=/var/folders/vz/sz23dk7j35x9hkd69x2208s40000gq/T/ansible-key3776119403 -i /var/folders/vz/sz23dk7j35x9hkd69x2208s40000gq/T/packer-provisioner-ansible3049382186 /Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/ansible/node.yml
amazon-ebs.{{userbuild_name}}: usage: ansible-playbook [-h] [--version] [-v] [-k]
amazon-ebs.{{userbuild_name}}: [--private-key PRIVATE_KEY_FILE] [-u REMOTE_USER]
amazon-ebs.{{userbuild_name}}: [-c CONNECTION] [-T TIMEOUT]
amazon-ebs.{{userbuild_name}}: [--ssh-common-args SSH_COMMON_ARGS]
amazon-ebs.{{userbuild_name}}: [--sftp-extra-args SFTP_EXTRA_ARGS]
amazon-ebs.{{userbuild_name}}: [--scp-extra-args SCP_EXTRA_ARGS]
amazon-ebs.{{userbuild_name}}: [--ssh-extra-args SSH_EXTRA_ARGS] [--force-handlers]
amazon-ebs.{{userbuild_name}}: [--flush-cache] [-b] [--become-method BECOME_METHOD]
amazon-ebs.{{userbuild_name}}: [--become-user BECOME_USER] [-K] [-t TAGS]
amazon-ebs.{{userbuild_name}}: [--skip-tags SKIP_TAGS] [-C] [--syntax-check] [-D]
amazon-ebs.{{userbuild_name}}: [-i INVENTORY] [--list-hosts] [-l SUBSET]
amazon-ebs.{{userbuild_name}}: [-e EXTRA_VARS] [--vault-id VAULT_IDS]
amazon-ebs.{{userbuild_name}}: [--ask-vault-password | --vault-password-file VAULT_PASSWORD_FILES]
amazon-ebs.{{userbuild_name}}: [-f FORKS] [-M MODULE_PATH] [--list-tasks]
amazon-ebs.{{userbuild_name}}: [--list-tags] [--step] [--start-at-task START_AT_TASK]
amazon-ebs.{{userbuild_name}}: playbook [playbook ...]
amazon-ebs.{{userbuild_name}}: ansible-playbook: error: argument --scp-extra-args: expected one argument
......
==> amazon-ebs.{{userbuild_name}}: Provisioning step had errors: Running the cleanup provisioner, if present...
==> amazon-ebs.{{userbuild_name}}: Terminating the source AWS instance...
ansible error can be addressed by following:
export ANSIBLE_SCP_EXTRA_ARGS="-O"
export ANSIBLE_SSH_ARGS="-oHostKeyAlgorithms=+ssh-rsa -oPubkeyAcceptedAlgorithms=+ssh-rsa"
I re-read the errors on master, it seems it is just the workaround of export ANSIBLESCPEXTRAARGS="-O" brought to the error:Executing Ansible: ansible-playbook -e packer_build_name="flatcar-stable" -e packer_builder_type=amazon-ebs --ssh-extra-args '-o IdentitiesOnly=yes' --extra-vars containerd_url=
ansible-playbook: error: argument --scp-extra-args: expected one argument
So I remove the env ANSIBLESCPEXTRAARGS and it can continue now.
Still the same thing fo build-ami-flatcar currently on master/0.1.14:
amazon-ebs.{{user build_name}}: TASK [kubernetes : Enable kubelet service] *
amazon-ebs.{{user build_name}}: fatal: [default]: FAILED! => {"changed": false, "msg": "Could not find the requested service kubelet: host"}
amazon-ebs.{{user build_name}}:
amazon-ebs.{{user build_name}}: PLAY RECAP ***
amazon-ebs.{{user build_name}}: default : ok=49 changed=36 unreachable=0 failed=1 skipped=210 rescued=0 ignored=0
when: kubernetes_source_type == "http" and kubernetes_cni_source_type == "http"
if you just care about getting an image this should work too:
diff --git a/images/capi/ansible/roles/kubernetes/tasks/main.yml b/images/capi/ansible/roles/kubernetes/tasks/main.yml
index 36d973b39..55885f1ef 100644
--- a/images/capi/ansible/roles/kubernetes/tasks/main.yml
+++ b/images/capi/ansible/roles/kubernetes/tasks/main.yml
@@ -21,6 +21,9 @@
- import_tasks: photon.yml
when: kubernetes_source_type == "pkg" and ansible_os_family == "VMware Photon OS"
+- import_tasks: url.yml
+ when: ansible_os_family == "Flatcar"
+
- name: Symlink cri-tools
file:
src: "/usr/local/bin/{{ item }}"
Thanks @Jeremi Piotrowski I added kubernetessourcetype == "http" and kubernetescnisource_type == "http" to my config json and then it works. I wasn't aware of these two things before. Thanks a lot!
It shouldn’t be needed since it’s part of the flatcar config that is included in before your file
I just use this config json but I didn't add below two lines before (not aware of it ..):
cat config-1.26.2-flatcar.json
{
"kubernetes_series": "1.26",
"kubernetes_semver": "v1.26.2",
"kubernetes_rpm_version": "1.26.2-0",
"kubernetes_deb_version": "1.26.2-00",
"kubernetes_source_type": "pkg",
"kubernetes_http_source": "",
"kubernetes_rpm_repo": "",
"kubernetes_rpm_gpg_key": "\" \"",
"kubernetes_rpm_gpg_check": "True",
"kubernetes_deb_repo": "\" kubernetes-xenial\"",
"kubernetes_deb_gpg_key": "",
"kubernetes_container_registry": "registry.k8s.io",
"kubernetes_load_additional_imgs": "false",
"kubeadm_template": "etc/kubeadm.yml",
"containerd_version": "1.6.8",
"containerd_sha256": "8e227caa318faa136e4387ffd6f96baeaad5582d176202fe9da69cde87036033",
#"kubernetes_source_type": "http",
#"kubernetes_cni_source_type": "http"
}
[openstack Built CAPI Openstack Image breaking in kubeadm init command]
I'm building a kubernetes image for openstack. The build processes is running ok, the image is built and I'm able to provision a instance with it. The problem is that apparently cloud-init is running a kubeadm init command with the wrong kubeadm.yaml config, it looks like it is over righting the config with a file in /run/kubeadm/kubeadm.yaml. This is causing coredns pull to fail and break the provisioning process.
I've changed the registry value in the following files:
cloudinit/user-data: imageRepository: registry.k8s.io
packer/config/kubernetes.json: "kubernetes_container_registry": "registry.k8s.io",
packer/qemu/packer.json: "kubernetes_container_registry": "registry.k8s.io",
packer_kubernetes.json: "kubernetes_container_registry": "registry.k8s.io",
Which version of kube-builder are you using? This should have been resolved with
I'm using the latest release, just check now there's some tags with other releases, I'll be updating with all the changes. But we already had this configs inside the repo.
The thing is that cloud-init is apparently running a kubeadm init command that overrides this configuration from user-data.
Is it that the kubeadm version being used is too old? Which provider is used for Openstack in image-builder?
just built from the last tag 0.1.14, and got the same error. it seems that kubeadm is writing its own kubeadm config file
Sorry, just got chance to look at this. It seems that the containerd_version isn't specified for that target so it relies on whatever is installed from Ubuntu 20.04 I think. I guess we need to specify the specific version to use. I'm not sure what version exactly is best.
Some updates on this. CAPI was overwriting the container registry address, so It was a issue on my side. But the 1.26.1 version of crictl is breaking a 1.22.9 k8s image for example. I think that can be related to containerd_version.
Possibly, I get quite confused with all the inter-dependancies 😅
Hello Image-builder maintainers, I have a question. The packer.json for AMI builds has a throughput field by default in the block device mappings section with a default value of 125, but this field is valid only for gp3 volumes. I tried building a gp2 Ubuntu AMI by overriding the throughput from a different JSON var-file by setting it to "" or even null , but in both cases the build failed saying
Throughput is not available for device /dev/sda1I was able to build it only after removing the throughput field from the packer.json. But removal also doesnt seem to be a solution because now there's no way of passing throughput in if I do want to build a GP3 volume. AMI missing something (pun intended 😛)? Is this a bug?
Hello everyone, @kiran keshavamurthy and I are trying to bump the upstream image-builder to the latest tag/release v0.1.14 . Kiran created the tag and release successfully. However, the container build that was automatically kicked off pushed the container to the registry without a tag.
Are we missing something here? Does anything in cloudbuild.yaml need to be modified while creating the tag/release?
Hey y'all 👋 are any of the other contributors and maintainers of image-builder going to be at Kubecon next month in Amsterdam? I'd be interested in a meetup to discuss the project, pain points, future, etc. if anyone was keen. Maybe at the contributor summit? (/cc @kiran keshavamurthy @jsturtevant @mboersma)
I won't either, but I very much endorse having a summit meeting about image-builder and establishing some future directions, etc.
I think a restart of the office hours would be useful.
agreed, we have had quite a bit of activity in the last few months.
In other sigs we used a doodle to collect times, or we can set one that works for the majority of the maintainers. thoughts?
I think a doodle would be useful. I think we have quite a split between US and EU timezones so I think it might be hard to find a time that pleases everyone.
Sure, but I'll pick it up on Monday. 🙂
Wow it's been a long time since I've used Doodle. It's got so many ads these days 🙈
Anyway, before I share into the main channel I'd like to get your thoughts first so here it is:
I went with constraints of no early than 8am PDT and no later than 7pm GMT. Hopefully that's good enough for most people.
Cool. I'll post it into the channel then. I'm thinking deadline for the end of this month to give people some time to answer but not have it drag on too long.
The days that have times today are grey out since they are in the past 🤨. Might need to put the dates in April
🤦♂️ I'll try and update it. It might cause the answers to need re-doing.
Actually, I'm just going to create a new one and update the link. I'll let you know when its up so you can answer again. Sorry about that.
Just a reminder that we're looking for peoples thoughts on the best day/time to run the office hours. Could all those who might be interested in attending at some point please fill out the above Doodle poll with your favoured slot. 🙏
Just one last reminder asking for peoples thoughts on this. 🙂 I'm going to close the poll tomorrow afternoon (UTC).
So far, Monday at 3:30pm UTC looks to be the favoured new slot for the office hours.
hey folks,
we are running into some issues with the ubuntu2204 images, namely with cloud-init. Maybe someone has seen it before 😅
That error isnt present at every start, but on about 70% of them
Can you log into the node and check the cloud-init logs in /var/log?
I'm curious to know what you're seeing in there as I've ran into an issue in the past few days since I've synced with upstream and it looks like cloud-init might be failing. It may be purely coincidental though and I'm not sure if it's a problem with image builder as I've tested using an image that was built before I started having this issue.
It was previously working and is currently not making me wonder if this could be a problem with CAPI itself. I'm still looking into this on my side.
I forgot to update you here. It was just a configuration issue on my side causing the cloud-init issue picard_facepalm. So it's very likely I can't assist you with my fix. However I do recommend if it's failing to check the cloud init logs to see what's going on in there.
Hi 👋
I am preparing a PR on ubuntu-22.04 efi on qemu which requires a user-data file that is different from the file that is in ova - can you advise on where to put that file / name it differently like prefix it with qemu-
Hello!
I'd say stick the PR in as a draft and then have the discussion in there. It'll give a "paper trail" for anyone else who may need to do something similar in future and helps any reviewers make decisions based on the conversation that happens in there. I recently had a pretty large PR go in that had something like 94 entries in the conversation tab (minus a few bot entries) but it was great to have it in there as anyone coming into the conversation had the history.
Right, closing the above poll. Based on the votes it looks like Mondays at 3:30pm UTC is the most favoured time slot for the office hours!
Do you have a date for when the the first one will take place? (I only ask as the link takes me to March 30th 😉 )
Will you also drop a link in here for access to the meeting? It's going to be a first for me so not sure where to go for any of it 🙂
Thanks!
Unsure right now. We wanted to find a suitable slot first.
Next Monday, April 10? I can be there.
I not sure who has access to update the calendar and zoom meeting time? @kiran keshavamurthy?
The next two Mondays are no good for me (UK bank holiday then Kubecon) but would be happy for it to go ahead without me 🙂
oh that's a good point - both of those things are applicable to me too. Baby brain is a killer.
We can always wait a week or two if that's better timing. First we should sort out the calendar so the new time is publicly visible.
I created the kubernetes/community PR to change the meeting time here:
(However, when I changed the time previously for CAPZ office hours, this wasn't sufficient to actually make the change on the public calendar, we may have to nudge someone...)
How do we go about getting this added to the public calendar?
Doesn't look like it I don't think. Unless my GCal isn't updating it? thinking
This seems to suggest a specific calendar needs to be created and shared. Not sure if that's still correct or not -
Hi Team, I built a RHEL 8 worker node AMI using image-builder and when I manually created a nodegroup for an existing EKS using an ASG, the nodes didn't get attached to the cluster. Upon further investigation I see some of the necessary scripts are missing from the image like /etc/eks/bootstrap.sh, etc. Wondering if its possible to build a worker node AMI using image-builder and if so, are they any further customizations that I have to perform.
Historically image builder was designed for kubeadm based clusters, of which EKS (to the best of my knowledge) is not one, I would recommend looking at
Hi, we build our capz flatcar images from the flatcar community library as a base but the 3510.2.0 is not available yet
pinging @Mateusz Gozdek (invidian) for reference ( since in the past you triggered the push )
➜ az sig image-version list-community --location westeurope --public-gallery-name flatcar-23485951-527a-48d6-9d11-6931ff0afc2e --only-show-errors --gallery-image-definition flatcar-stable-amd64 -o json | jq '.[]
.name'
"3374.2.0"
"3374.2.1"
"3374.2.3"
"3374.2.4"
"3374.2.5"
@Marcus Noble @jsturtevant @mboersma
Thought you may be interested in this based on the earlier discussion during office hours:
@mboersma I just noticed that the image-builder event is now correct in the Kubernetes calendar! 😄 Was that your doing?
set up a reminder “Image-Builder office hours start in 1 hour. Agenda: https://docs.google.com/document/d/1YIOD0Nnid_0h6rKlDxcbfJaoIRNO6mQd9Or5vKRNxaU/edit” in this channel at 2:30PM every other Monday (next occurrence is May 8th), Greenwich Mean Time.
Hi, I'm new here, so hello everyone. As part of my activities around the sig-windows-dev-tools, I'm happy to share that after several days of attempts I've completed my first successful, I think, run of the image-builder on Windows generating Windows Server image running the builder from WSL. I do realise WSL is not a typical environment for image-builder users, but I just wanted to share that it is feasible to use WSL.
Hi folks, when I use v0.1.14 tag branch to build ami, I meet an issue:
amazon-ebs.{{user build_name}}: fatal: [default]: FAILED! => {"changed": true, "cmd": "kubeadm config images pull --config /etc/kubeadm.yml --cri-socket /var/run/containerd/containerd.sock", "delta": "0:00:00.036572", "end": "2023-05-05 03:06:53.596128", "msg": "non-zero return code", "rc": 1, "start": "2023-05-05 03:06:53.559556", "stderr": "your configuration file uses an old API spec: \"kubeadm.k8s.io/v1beta2\". Please use kubeadm v1.22 instead and run 'kubeadm config migrate --old-config old.yaml --new-config new.yaml', which will write the new, similar spec using a newer API version.\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["your configuration file uses an old API spec: \"kubeadm.k8s.io/v1beta2\". Please use kubeadm v1.22 instead and run 'kubeadm config migrate --old-config old.yaml --new-config new.yaml', which will write the new, similar spec using a newer API version.", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}any idea? Thanks!
Full log:
no_proxy=** make build-ami-ubuntu-2004
hack/ensure-ansible.sh
Starting galaxy collection install process
Nothing to do. All requested collections are already installed. If you want to reinstall them, consider using--force.
hack/ensure-ansible-windows.sh
IMPORTANT: Winrm connection plugin for Ansible on MacOS causes connection issues.
See for more details.
To fix the issue provide the enviroment variable 'no_proxy='
Example call to build Windows images on MacOS: 'no_proxy= make build-'
hack/ensure-packer.sh
hack/ensure-goss.sh
Right version of binary present
packer build -var-file="/Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/packer/config/kubernetes.json" -var-file="/Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/packer/config/cni.json" -var-file="/Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/packer/config/containerd.json" -var-file="/Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/packer/config/wasm-shims.json" -var-file="/Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/packer/config/ansible-args.json" -var-file="/Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/packer/config/goss-args.json" -var-file="/Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/packer/config/common.json" -var-file="/Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/packer/config/additional_components.json" -color=true -var-file="/Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/packer/ami/ubuntu-2004.json" -var-file="/Users/yikew/Working/capa/image-builder/1.27/config.json" packer/ami/packer.json
amazon-ebs.{{userbuild_name}}: output will be in this color.
==> amazon-ebs.{{user `build_name`}}: Prevalidating any provided VPC information
==> amazon-ebs.{{user `build_name`}}: Prevalidating AMI Name: capa-ami-ubuntu-20.04-v1.27.0-1683255429
amazon-ebs.{{user `build_name`}}: Found Image ID: ami-0481e8ba7f486bd99
==> amazon-ebs.{{user `build_name`}}: Creating temporary keypair: packer_64547086-412f-e577-58a5-b431924ffd0d
==> amazon-ebs.{{user `build_name`}}: Creating temporary security group for this instance: packer_6454708e-4b3f-2bec-e059-ec5d99783a1b
==> amazon-ebs.{{user `build_name`}}: Authorizing access to port 22 from [0.0.0.0/0] in the temporary security groups...
==> amazon-ebs.{{user `build_name`}}: Launching a source AWS instance...
amazon-ebs.{{user `build_name`}}: Instance ID: i-00e09efbb79654a7b
==> amazon-ebs.{{user `build_name`}}: Waiting for instance (i-00e09efbb79654a7b) to become ready...
==> amazon-ebs.{{user `build_name`}}: Using SSH communicator to connect: 44.202.183.215
==> amazon-ebs.{{user `build_name`}}: Waiting for SSH to become available...
==> amazon-ebs.{{user `build_name`}}: Connected to SSH!
==> amazon-ebs.{{user `build_name`}}: Provisioning with shell script: /var/folders/vz/sz23dk7j35x9hkd69x2208s40000gq/T/packer-shell604033847
==> amazon-ebs.{{user `build_name`}}: Provisioning with shell script: ./packer/files/flatcar/scripts/bootstrap-flatcar.sh
==> amazon-ebs.{{user `build_name`}}: Provisioning with Ansible...
amazon-ebs.{{user `build_name`}}: Setting up proxy adapter for Ansible....
==> amazon-ebs.{{user `build_name`}}: Executing Ansible: ansible-playbook -e packer_build_name="ubuntu-20.04" -e packer_builder_type=amazon-ebs --ssh-extra-args '-o IdentitiesOnly=yes' --extra-vars containerd_url= containerd_sha256=1d86b534c7bba51b78a7eeb1b67dd2ac6c0edeb01c034cc5f590d5ccd824b416 pause_image=registry.k8s.io/pause:3.9 containerd_additional_settings= containerd_cri_socket=/var/run/containerd/containerd.sock containerd_version=1.6.20 containerd_wasm_shims_url= containerd_wasm_shims_version=v0.3.3 containerd_wasm_shims_sha256=da84b1c065a58f95a841d39e143cd7115d43e6faedcce7a8782f2942388260d7 containerd_wasm_shims_runtimes="" crictl_url= crictl_sha256= crictl_source_type=pkg custom_role_names="" firstboot_custom_roles_pre="" firstboot_custom_roles_post="" node_custom_roles_pre="" node_custom_roles_post="" disable_public_repos=false extra_debs="" extra_repos="" extra_rpms="" http_proxy= https_proxy= kubeadm_template=etc/kubeadm.yml kubernetes_cni_http_source= kubernetes_cni_http_checksum=sha256: kubernetes_http_source= kubernetes_container_registry=registry.k8s.io kubernetes_rpm_repo= kubernetes_rpm_gpg_key=" " kubernetes_rpm_gpg_check=True kubernetes_deb_repo=" kubernetes-xenial" kubernetes_deb_gpg_key= kubernetes_cni_deb_version=1.2.0-00 kubernetes_cni_rpm_version=1.2.0-0 kubernetes_cni_semver=v1.2.0 kubernetes_cni_source_type=pkg kubernetes_semver=v1.27.0 kubernetes_source_type=pkg kubernetes_load_additional_imgs=false kubernetes_deb_version=1.27.0-00 kubernetes_rpm_version=1.27.0-0 no_proxy= pip_conf_file= python_path= redhat_epel_rpm= epel_rpm_gpg_key= reenable_public_repos=true remove_extra_repos=false systemd_prefix=/usr/lib/systemd sysusr_prefix=/usr sysusrlocal_prefix=/usr/local load_additional_components=false additional_registry_images=false additional_registry_images_list= additional_url_images=false additional_url_images_list= additional_executables=false additional_executables_list= additional_executables_destination_path= build_target=virt amazon_ssm_agent_rpm= --extra-vars --extra-vars --scp-extra-args "-O" -e ansible_ssh_private_key_file=/var/folders/vz/sz23dk7j35x9hkd69x2208s40000gq/T/ansible-key68078117 -i /var/folders/vz/sz23dk7j35x9hkd69x2208s40000gq/T/packer-provisioner-ansible1682039367 /Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/ansible/node.yml
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: PLAY [all] ******************************************************************************************************************************************
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [Gathering Facts] ******************************************************************************************************************
amazon-ebs.{{user `build_name`}}: ok: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [include_role : node] **********************************************************************************************************
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [setup : Put templated sources.list in place] **********************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [setup : Put templated apt.conf.d/90proxy in place when defined] ********************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [setup : perform a dist-upgrade] ************************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [setup : install baseline dependencies] **********************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [setup : install extra debs] ********************************************************************************************
amazon-ebs.{{user `build_name`}}: ok: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [setup : install pinned debs] ******************************************************************************************
amazon-ebs.{{user `build_name`}}: ok: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [node : Ensure overlay module is present] ******************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [node : Ensure br_netfilter module is present] ********************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [node : Persist required kernel modules] ********************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [node : Set and persist kernel params] ************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default] => (item={'param': 'net.bridge.bridge-nf-call-iptables', 'val': 1})
amazon-ebs.{{user `build_name`}}: changed: [default] => (item={'param': 'net.bridge.bridge-nf-call-ip6tables', 'val': 1})
amazon-ebs.{{user `build_name`}}: changed: [default] => (item={'param': 'net.ipv4.ip_forward', 'val': 1})
amazon-ebs.{{user `build_name`}}: changed: [default] => (item={'param': 'net.ipv6.conf.all.forwarding', 'val': 1})
amazon-ebs.{{user `build_name`}}: changed: [default] => (item={'param': 'net.ipv6.conf.all.disable_ipv6', 'val': 0})
amazon-ebs.{{user `build_name`}}: changed: [default] => (item={'param': 'net.ipv4.tcp_congestion_control', 'val': 'bbr'})
amazon-ebs.{{user `build_name`}}: changed: [default] => (item={'param': 'vm.overcommit_memory', 'val': 1})
amazon-ebs.{{user `build_name`}}: changed: [default] => (item={'param': 'kernel.panic', 'val': 10})
amazon-ebs.{{user `build_name`}}: changed: [default] => (item={'param': 'kernel.panic_on_oops', 'val': 1})
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [node : Ensure auditd is running and comes on at reboot] ************************************
amazon-ebs.{{user `build_name`}}: ok: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [node : configure auditd rules for containerd] ********************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [node : Ensure reverse packet filtering is set as strict] **********************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [node : Copy udev etcd network tuning rules] ************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [node : Copy etcd network tuning script] ********************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [include_role : providers] ************************************************************************************************
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : include_tasks] **********************************************************************************************
amazon-ebs.{{user `build_name`}}: included: /Users/yikew/Projects/src/github.com/kubernetes-sigs/image-builder/images/capi/ansible/roles/providers/tasks/aws.yml for default
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : upgrade pip to latest] ******************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : install aws clients] **********************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : install aws agents Ubuntu] **********************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : Ensure ssm agent is running Ubuntu] ****************************************************
amazon-ebs.{{user `build_name`}}: ok: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : Disable Hyper-V KVP protocol daemon on Ubuntu] ******************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : Creates unit file directory for cloud-final] **********************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : Create cloud-final boot order drop in file] ************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : Creates unit file directory for cloud-config] ********************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : Create cloud-final boot order drop in file] ************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : Make sure all cloud init services are enabled] ******************************
amazon-ebs.{{user `build_name`}}: ok: [default] => (item=cloud-final)
amazon-ebs.{{user `build_name`}}: ok: [default] => (item=cloud-config)
amazon-ebs.{{user `build_name`}}: ok: [default] => (item=cloud-init)
amazon-ebs.{{user `build_name`}}: ok: [default] => (item=cloud-init-local)
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : Create cloud-init config file] **************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : set cloudinit feature flags] ******************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [providers : Ensure chrony is running] ************************************************************************
amazon-ebs.{{user `build_name`}}: ok: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [include_role : containerd] **********************************************************************************************
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : Install libseccomp2 package] ****************************************************************
amazon-ebs.{{user `build_name`}}: ok: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : download containerd] ********************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : Create a directory if it does not exist] ****************************************
amazon-ebs.{{user `build_name`}}: ok: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : unpack containerd] ************************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : delete /opt/cni directory] ********************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : delete /etc/cni directory] ********************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : Creates unit file directory] ****************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : Create containerd memory pressure drop in file] **************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : Create containerd max tasks drop in file] **************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : Create containerd http proxy conf file if needed] **********************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : Creates containerd config directory] ************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : Copy in containerd config file etc/containerd/config.toml] ******
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : Copy in crictl config] ****************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : start containerd service] **********************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : delete tarball] ******************************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [containerd : delete tarball] ******************************************************************************************
amazon-ebs.{{user `build_name`}}: ok: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [include_role : kubernetes] **********************************************************************************************
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [kubernetes : Add the Kubernetes repo key] ****************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [kubernetes : Add the Kubernetes repo] ************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [kubernetes : Install Kubernetes] **********************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [kubernetes : Symlink cri-tools] ************************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default] => (item=ctr)
amazon-ebs.{{user `build_name`}}: changed: [default] => (item=crictl)
amazon-ebs.{{user `build_name`}}: changed: [default] => (item=critest)
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [kubernetes : Create kubelet default config file] **************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [kubernetes : Enable kubelet service] **************************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [kubernetes : Create the Kubernetes version file] **************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [kubernetes : Check if Kubernetes container registry is using Amazon ECR] ******
amazon-ebs.{{user `build_name`}}: ok: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [kubernetes : Create kubeadm config file] ******************************************************************
amazon-ebs.{{user `build_name`}}: changed: [default]
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: TASK [kubernetes : Kubeadm pull images] ********************************************************************************
amazon-ebs.{{user `build_name`}}: fatal: [default]: FAILED! => {"changed": true, "cmd": "kubeadm config images pull --config /etc/kubeadm.yml --cri-socket /var/run/containerd/containerd.sock", "delta": "0:00:00.036572", "end": "2023-05-05 03:06:53.596128", "msg": "non-zero return code", "rc": 1, "start": "2023-05-05 03:06:53.559556", "stderr": "your configuration file uses an old API spec: \"kubeadm.k8s.io/v1beta2\". Please use kubeadm v1.22 instead and run 'kubeadm config migrate --old-config old.yaml --new-config new.yaml', which will write the new, similar spec using a newer API version.\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["your configuration file uses an old API spec: \"kubeadm.k8s.io/v1beta2\". Please use kubeadm v1.22 instead and run 'kubeadm config migrate --old-config old.yaml --new-config new.yaml', which will write the new, similar spec using a newer API version.", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}
amazon-ebs.{{user `build_name`}}:
amazon-ebs.{{user `build_name`}}: PLAY RECAP ******************************************************************************************************************************************
amazon-ebs.{{user `build_name`}}: default : ok=55 changed=43 unreachable=0 failed=1 skipped=205 rescued=0 ignored=0
amazon-ebs.{{user `build_name`}}:
==> amazon-ebs.{{user `build_name`}}: Provisioning step had errors: Running the cleanup provisioner, if present...
==> amazon-ebs.{{user `build_name`}}: Terminating the source AWS instance...
==> amazon-ebs.{{user `build_name`}}: Cleaning up any extra volumes...
==> amazon-ebs.{{user `build_name`}}: No volumes to clean up, skipping
==> amazon-ebs.{{user `build_name`}}: Deleting temporary security group...
==> amazon-ebs.{{user `build_name`}}: Deleting temporary keypair...
Build 'amazon-ebs.{{user `build_name`}}' errored after 10 minutes 36 seconds: Error executing Ansible: Non-zero exit status: exit status 2
==> Wait completed after 10 minutes 36 seconds
==> Some builds didn't complete successfully and had errors:
--> amazon-ebs.{{user `build_name`}}: Error executing Ansible: Non-zero exit status: exit status 2
==> Builds finished but no artifacts were created.
my config file:
{
"kubernetes_series": "1.27",
"kubernetes_semver": "v1.27.0",
"kubernetes_rpm_version": "1.27.0-0",
"kubernetes_deb_version": "1.27.0-00",
"kubernetes_source_type": "pkg",
"kubernetes_http_source": "",
"kubernetes_rpm_repo": "",
"kubernetes_rpm_gpg_key": "\" \"",
"kubernetes_rpm_gpg_check": "True",
"kubernetes_deb_repo": "\" kubernetes-xenial\"",
"kubernetes_deb_gpg_key": "",
"kubernetes_container_registry": "registry.k8s.io",
"kubernetes_load_additional_imgs": "false",
"kubeadm_template": "etc/kubeadm.yml",
"containerd_version": "1.6.20",
"containerd_sha256": "1d86b534c7bba51b78a7eeb1b67dd2ac6c0edeb01c034cc5f590d5ccd824b416"
}
The error seems to suggest that you're using a now deprecated version of the kubeadm configfile.
This was resolved a couple weeks ago in but hasn't made it into a release yet. You'll be able to get around it for now by using the latest from the master branch until a new release is released.
@mboersma @jsturtevant we spoke about releases in the last office hours, looks like we could do with doing one so that we support Kubernetes v1.27 with a tagged release of image-builder.
ok then let me use master branch to build 1.27 related amis. Thanks a lot!
I may not make this office hours due to moving house and everything being all over the place in the new house right now. I will try though.
We need to do a fresh release of image-builder, but there is some confusion about the process and the last attempt apparently didn't publish a tagged image.
@kiran keshavamurthy @jsturtevant @Marcus Noble should we get together and see if we can sort things out? (Or does anyone already have a handle on things?)
Yes, Sorry I missed the meeting today. I didn't have it on the calendar. I can do something later in the week?
confusion about the processIs there anything in it related to the Windows images?
@mloskot not sure I follow your question. This for the release of the image-builder Docker image and tagging the repository. The docker image is Linux only but can get used with various providers to create Windows. Does that help?
folks up for doing a session on Thursday? Maybe 8:30 pacific so European timezone can join if they want?
@jsturtevant Yes, I realise it's not very concrete. I was trying to make myself aware of image-builder issues w.r.t. Windows.
I don't know If I can create a meeting invite but we can re-use the link to the weekly meeting. We can all just sign on at that time.
I’ll try and attend but might be late.
@kiran keshavamurthy does that work for you? I think you have the most context but I am sure we can figure it out 🙂
I think so. I have some home repair guys coming in between 8-10am. But I should be able to be on the call and chime in.
Just found another location we need to update owners:
I've started a PR to add some Makefile stuff and docs for releasing, just FYI so we don't duplicate effort.
I just noticed that the next date in the office hours notes is down as the 15th but the next in the calendar is the 22nd. Am I correct that its a mistake in the notes or are we wanting to meet up early?
That was my mistake, date math is hard. I updated the notes to the 22nd.
Has anyone tried running the image builder with VirtualBox lately?
After pretty good experiments on Windows host (with WSL), I'm now trying the IB's canonical way, that is on Linux machine with Ubuntu 22.04 and I'm getting errors, but before I start spamming GitHub with issues and PRs, I'd like confirm if IB workflow w/ VirtualBox is still sounds or needs updating indeed.
are you building linux images on wsl? I haven't used VB provider from wsl
@jsturtevant
Yes, I have experimented with it, but it requires a hybrid environment: WSL is only used as proxy running Ansible, Packer and Vagrant, but VirtualBox runs on Windows host
Here is my branch with single commit with scratchnotes and changes I had to apply as well as full log
However, I've ditched this idea as there seem to be too much gymnastics needed, and hardcoded IPs, and I don't see how this could be made into generic changes approvable by image builder and SWDT.
Instead, I've got a Linux machine where I'm going to try building images
I'm experiencing a weird error when building a RHEL8 ova for CAPV; after defining values in additional_components.json just like I've done for Ubuntu18/20/22:
"additional_registry_images": "true",I see the following error when Packer starts the Ansible provisioner:
"additional_registry_images_list": "ghcr.io/kube-vip/kube-vip:v0.5.5",
vsphere-iso.vsphere: TASK [Gathering Facts] **
vsphere-iso.vsphere: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: command-line line 0: garbage at end of line; \"-o\".", "unreachable": true}
Even if I use just node as image name (since it's shorter than ghcr.io/kube-vip/kube-vip:v0.5.5), I still get the same error
Ok, figured it out; somehow packer/ansible/ssh in my build-environment image introduced this new behaviour; I rolled back that image and the issue is no longer present.
I'll try rebuilding with latest versions of all binaries involved and see if the behaviour re-occurs.
Looks like the difference is Ansible 2.14.5 vs 2.15.0; pinning to 2.14.x to avoid the issue for now
Sure thing, though I'm not using make deps to install stuff, so my environment is not necessarily representative.
Something I've noticed between building node-templates (.ova's) for vSphere. Ubuntu 20.04 produced 1.7GB large .ova's, Ubuntu 22.04 produces 3.0-3.5GB large .ova's (for reference, RHEL8 is 1.8GB large).
What can explain the increased size between 20.04 and 22.04; should we add a custom role for 22.04 to uninstall a load of new packages (for instance the frustrating needrestart as just a random example)?
PSA: image-builder branch renaming and cruft removal
The image-builder project initially tried to collect VM image-building utilities for Kubernetes in one repository. Over time, the kubedeploy/imagebuilder and konfigadm tools have become unmaintained, while the images/capi area has remained active.
Users and developers are only interested in this latter "Image Builder for Cluster API" area. The presence of the other tools (which have not had code changes in over two years) is an impediment.
We (image-builder maintainers) propose removing these unmaintained projects to simplify the repository. Additionally, we would like to begin using "main" branch nomenclature at roughly the same time as the "cruft removal" described above.
If you have any feedback on these proposed changes, please let us know by commenting on either of the issues listed below before Tuesday, May 30.
Removing unmaintained projects from image-builder · Issue #1143 · kubernetes-sigs/image-builder (github.com)
Rename "master" branch to "main" · Issue #1161 · kubernetes-sigs/image-builder (github.com)
Hurray, then I can stop doing sparse checkouts on that repo 😄
I have just learnt that test-infra has a component called "image-builder" 🙈
This might add some weight to the proposal of also renaming the project to be something like cluster-api-image-builder
In that train of thought, openshift has an imagebuilder too
It's unavoidable really; descriptive names for building images don't vary a lot 😄
True. But if we can be more specific without it being a problem then that might be worth doing 🙂
Has anyone got experience with RHEL8 and timing issues with regards to containerd configuration (through - files)?
Hi Team, im trying to create ubuntu capi image. I would like to know how to authenticate it with root/capi user and password?
You would need to add an SSH public key to it as part of your cloud-init as the password is not generally available. Once you have the key on the machine, you could log in using that and change the password as you wish. Personally I wouldn't recommend that due to the security implications that would introduce.
Thanks @Drew Hudson-Viles , pls tell me if this will work - in packer.json there is "ssh-username=builder" . So can i login like ssh builder@IP with password ? to create a image can i use PACKERFLAGS="--var 'kubernetesrpmversion=1.24.0-0' --var 'kubernetessemver=v1.24.0' --var 'kubernetesseries=v1.24' --var 'kubernetesdebversion=1.24.0-00' --var 'disksize=10240'" --var 'sshpassword=enggfusion' --var 'sshusername=enggfusion' make build-kubevirt-qemu-ubuntu-2004
No problem.
So, the username and password bit you've referred to is for the builder, not the end result of an image - IE these will not be available on the image once you've built it.
See more information on the SSH communicator here.
If you're building from a standard, unaltered Ubuntu image then once the image is built, the only username available by default will be the one supplied by Ubuntu, which is ubuntu
To get your own username and password onto the image, you'd need to provide user-data via the cloud-init "method".
Depending on your infrastructure for creating the VM from this image this can vary.
I've struggled with this exact issue quite a bit (if your nodes do not even get networking configured correctly for instance); my workaround in the end was to temporarily leave the builder account unlocked for debugging, and once issues were resolved replace it again with an image that has the builder account locked.
For reference; change the "shutdown_command" and remove usermod -L {{user ssh_username}} &&
Any idea on who populates this instance-data.json file while provisioning using CAPV provider?
we are seeing issues with RHEL-8 vm templates as ds.meta_data.hostname is getting populated as ‘localhost’
cat instance-data.json
{
"base64_encoded_keys": [],
"ds": {
"meta_data": {
"hostname": "localhost",
"instance-id": "phani-rhel-26-1-9g5bt",
"local-hostname": "phani-rhel-26-1-9g5bt",
"local-ipv4": "10.109.10.99",
"local-ipv6": "",
"local_hostname": "localhost"
}
}
I've been spending some time working on the repo cleanup tasks today. I've got the following PRs opened to work towards the cleanup of old projects and the rename of master branch:
Will the folder images/capi move to the root of the repo afterwards?
I think we'll leave it for now as not sure what kind of effect that'd then have on all the other projects relying on it.
Hopefully later we'll be able to move it but I think we should minimize the potential impact right now. 🙂
I'm not going to be able to make the office hours on Monday but I've added a couple notes to the existing items on the agenda. Feel free to ping me on Slack with anything needed 🙂
So... turns out that Packer like to introduce breaking changes into patch releases 🤨 We recently bumped Packed from v1.8.6 -> v1.8.7 and that release removed some vendor plugins, specifically for us DigitalOcean. 🤦♂️
I opened an issue to cover this:
And a PR to fix it by specifying the external plugin:
The original PR that did the version bump didn't seem to run the pull-packer-validate test that would have caught this. I'll be updating the test in test-infra to make sure this is triggered when changing the ensure script. -
Hashicorp doesn't follow semantic versioning properly in all of their products 😞
@Drew Hudson-Viles you should be good to rebase your PR on master now and hopefully should all work 🤞
Lovely. I'll get that done shortly then. Just fueling myself with coffee after a lovely night of a non sleeping baby.
D'oh, and that's why I didn't try to upgrade us to the 1.9.x series, where they mention removing default plugins. Seemed like 1.8.7 was safe, apologies!
Nah it's ok. Lesson learned 🙂 I thought those tests would have ran on that PR you did so that should have caught it but they weren't actually configured to 🤦♂️
@jsturtevant Mind giving this an /approve ? You should be able to now we've had the owners updated. 🙂
Hi... I'm working on building a few cluster images here, and I've got an issue with Rocky Linux 8, which was giving me a 404
I've submitted a PR fixing the URL, as the URL was looking for 8.7 and the actual version is 8.8
is there any roadmap to add new distros/versions? Like Ubuntu 23.04, Rocky Linux 9?
Just open this issue as the commit
is breaking the loadadditionalcomponents feature
So, yes you're right, somehow the defaults are not there which is blowing my mind as I specifically remember putting them in originally. I can only presume I did something in the process of creating the addition that removed them. Probably an erroneous rebase picard_facepalm
I've got fix prepared for this that also adds them "back" in as part of it (though it's not the actual fix).
As for your notes on the aws ansible collection not being officially supported in the current version. It looks like I didn't hardcode the version and an update has occurred since the original PR was created. I'll hardcode the version in to prevent errors as 5.x.x supports 2.11.0+
I'm running a few tests against the fix now it now wrt the issue you've raised which includes me completely removing the collection to ensure it works without it installed, then I'll get a PR put in.
perfect don’t hesitate to share me the PR for verification on my side
I'm just running one last test my side to ensure the Nvidia bits still work (It did last night but it was late and I want a non-tired brain to confirm it) and then we should be good to go.
Aaah thanks for raising this, I didn't hit this in my local testing but maybe didn't hit the use case of loadadditionalcomponents. I shall take a look into this.
there is multiple issues here
It shouldn't really be affecting you if you're not using the role so I'll see what can be done about that. Ideally you should only need to include the hack file when making use of the S3 role/Nvidia role.
I'll take a look asap to get a fix in.
@knfoo I think since the PR is in draft …. it probably won’t get looked at
@Joe Kratzat OK, I was advised to put it into draft first - I can make it a real PR maybe it will get some attention then 🙂
oh don’t listen to me … I’m not a maintainer … 🤣
Just noticed it was in draft and that typically doesn’t ping people
Sorry @knfoo we clearly dropped the ball here. I took a look at it and don't see anything controversial.
It would be good to get more eyes on it, but at least one maintainer is on break and July 4 is imminent. Maybe if you promote it from draft status it will notify people?
@mboersma that is OK - I got attention now 🙂
I will promote it from draft and see what happens
Do we want to un-pin this issue from the GitHub issues page?
I've just noticed we have another OWNERS file here that requests CAPI maintainers to give approvals. See example of confusion on this pr:
Anyone against removing this and just relying on the image-builder OWNERS file. Now we've only got the single project I don't think it makes sense to have both anymore.
/cc @mboersma @jsturtevant
@mboersma Looks like it needs /lgtm also. I thought /approve implied that but I guess not 😕
Hello team, is there a specific release cadency in place for image-builder ? if not what is the rules to trigger a new release ?
It’s pretty ad-hoc right now.
Are you looking for a new release for a specific feature?
yes
the EKS image-builder need to consume some change we put in the nutanix packer flavor
and he can only link to a specific tag or a release no more a commit
so we need a new tag or release 😄
Makes sense.
@mboersma @jsturtevant I think we said recently we wanted to do a release before the branch rename anyway, right?
Sorry, was referring to Matt and James 🙂 They're US based so should be online later. I also need to chase up getting that PR approved so I have the right permissions too.
I can do a release today, I'll get started in a bit. Thanks for the nudge @Christophe Jauffret.
Need any help @mboersma or you ok to handle it all? 🙂
Thanks, I've got it. I would pair with you @Marcus Noble but I'm in a sig-docs meeting and multitasking. 🙂
Is it possible to use an existing qcow2 image file (which is not created by image-builder), then pass that file as input to image-builder, and image-builder could generate a new image file, which would be cluster API compatible?
image builder uses ansible and packer to configure a vm. If there was a packer provisiner for qcow2 maybe? I don't know anything about the format
There's command to make qemu compatible image file via make_qemu command, I can't find a packer provisioner for qcow2.
you just need to use an infrastructure provider who can consume qcow2 file as input, i use the Nutanix one like that, of course you need the corresponding infra.
we can also modify the qemu one to work like that i imagine
ok it would be great if there's some existing reference doc for this one..
No. I am not asking if possible to generate an image to be useful on certain platform. I am asking if possible to take an existing qcow2 image, run imagebuilder process on that image to add relevant kubelet/kubeadm/certs in order for that existing qcow2 image to be k8s cluster api compatible. Per above nutanix provider doc, it seems it could generate an image usable for nutanix, but I could not find the place where it could add things to an existing image file.
image-builder project generate CAPI ready image for a bunch of platform, there is no generic way
on which platform did you plan to run your image at the end ?
I'm not sure I understand what you're asking exactly.
The amifilterowners is used to filter the list of returned AMIs that match the amifiltername. This should result in just the base images we require. All the ones defined in the various .json files we have should be publicly available, if not then we need to fix those.
If you're looking to use your own base image then you can replace amifilterowners and amifiltername with valid filters that will result in your account finding the AMI you want to use.
Is that what you were looking for?
I see. Who is responsible to maintain the base amis? And are the base amis avaliable in all the aws regions?
That I'm not sure about unfortunately 😞
Hopefully they are "official" distro images but I'd need to check to be sure.
Out of curiosity, what prompted the question? Are you looking to change something with CAPA?
What does "Official" mean? Does it mean they are available across all aws regions or we don't need take care its mantainece or something?
Official here means the account is run by either AWS themselves (AWS Marketplace) or is the account suggested to use by the distro's own documentation.
None of those images are without our control.
Asking it is because our account that is responsible for making and publishing AMIs is getting migrated. So I think more on the base ami 🙂
I'll not be around for this one I'm afraid. I've not long got back from my holiday and I'm a bit all over the place atm.
I've not got anything major to raise for this one anyway.
WRT what @Marcus Noble has put in the agenda so far, I'll simply say I agree with all 😄
@mboersma I took a little look at what we could do with the Makefile include. It's possible but we'd need to do a lot of updating to the Makefile tasks to make use of $(CURDIR) when referencing any files. I'm not confident enough to make all those changes as I don't understand a good chunk of that Makefile 😅
If you're interested, the basic idea would be...
In the root Makefile:
[...existing code...]
export CWD=images/capi
include $(CWD)/Makefile
CWD ?= ./(and then update all references to also use $(CURDIR))
CURDIR := $(realpath $(CWD))
@Marcus Noble that does sound a bit fiddly, but not complicated. Maybe we could try it out and see how it acts IRL.
(For context, Marcus and I were discussing flattening the repository, since everything important is nested down in /images/capi. We thought if we could somehow "proxy" the nested Makefile through the root one, it would relieve most of the pain. That is, typing make -C images/capi
I’ll try and get a PR up at some point. The main thing that might cause problems is if we run any scripts that expect the cwd to be images/capi
@mboersma I finally for to putting the PR together. I'm not 100% sure I've caught everything but the handful of tasks I checked worked as expected.
As an aside... that Makefile is horrible to work with. There's SO MUCH. 😆
I know, it's huge and hard to navigate. I'll try out the make changes soon, sorry to let it dangle @Marcus Noble. (And thanks!)
Hey y'all, is anyone still building Ubuntu-18.04 images with image-builder?
We're currently removing it from GCP because the base image is no longer available and wondering if we should remove it from all providers, even those that still have base images available. Ideally we'd like to clear out old OS versions but if there's still a need for it we'll keep it in image-builder for the time being. 🙂
We stopped building 18.04 for CAPZ in April, when it went out of support.
The only reason I haven't made a PR to remove it from azure/ is this:
With an Ubuntu Pro subscription, your Ubuntu 18.04 LTS deployment can receive Expanded Security Maintenance (ESM) until 2028.
So theoretically, there could be an enterprise user with an Ubuntu Pro sub building their own images (as we strongly recommend, the "reference images" we build aren't updated with CVE fixes for one).
Although this theoretical user can and probably has forked the image-builder repo, so they can continue using the 18.04 make targets if we remove them.
I think for the vast majority of use cases, it's at best clutter and at worst a liability to keep all the 18.04-related stuff.
Evidently I've just talked myself into removing 18.04 globally, but I'm curious to hear other opinions still.
I agree with your reasoning. It’s always possible to use an older version of image-builder if needed. Or adding a custom packer json.
If no one comes forward with a strong case for keeping it then let’s clear it out from all providers.
Could anyone confirm for me if the qemu images can / are ever used on VMWare platforms?
I've opened a pull request to address this issue about open-vm-tools being installed in qemu images due to us sym-linking some files between providers but I want to make sure first that removing this isn't going to cause issues for anyone.
Hello folks, I'm trying to build a Flatcar image using QEMU builder and it fails with this:
$ make OEM_ID=openstack build-qemu-flatcarIt was working fine previously, I suspect the OpenSSH upgrade of my system:
...
qemu:
qemu: PLAY [all] *
qemu:
qemu: TASK [Gathering Facts]
qemu: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: command-line line 0: keyword identitiesonly extra arguments at end of line", "unreachable": true}
qemu:
qemu: PLAY RECAP *
qemu: default : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
qemu:
==> qemu: Provisioning step had errors: Running the cleanup provisioner, if present...
==> qemu: Deleting output directory...
Build 'qemu' errored after 7 minutes 55 seconds: Error executing Ansible: Non-zero exit status: exit status 4
$ ansible --version
ansible [core 2.15.1]
config file = /home/mathieu/github/kubernetes-sigs/image-builder/images/capi/ansible.cfg
configured module search path = ['/home/mathieu/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/mathieu/github/kubernetes-sigs/image-builder/images/capi/.env/lib/python3.11/site-packages/ansible
ansible collection location = /home/mathieu/.ansible/collections:/usr/share/ansible/collections
executable location = /home/mathieu/github/kubernetes-sigs/image-builder/images/capi/.env/bin/ansible
python version = 3.11.3 (main, Jun 2 2023, 13:54:39) [GCC 12.2.1 20230428] (/home/mathieu/github/kubernetes-sigs/image-builder/images/capi/.env/bin/python)
jinja version = 3.1.2
libyaml = True
$ ssh -V
OpenSSH_9.3p1, OpenSSL 1.1.1u 30 May 2023
Are you using the latest image-builder from the master branch or a tagged release?
Are you able to check if you experience the same issue using the Docker image?
That seems better from Docker - I see the image is using a different OpenSSH version:
OpenSSH_8.9p1 Ubuntu-3ubuntu0.1, OpenSSL 3.0.2 15 Mar 2022Let me see if I can get the faulty command.
Ok, so that at least confirms that the issue is related to OpenSSH being updated. Not sure how we'll tackle that thinking
Hmm... so maybe we just need to bump Ansible to v2.15.1?
Do you mind trying again with this value bumped up?
I'd say the opposite actually: the version 2.15.1 does not currently work with image-builder.
Let me try with your version !
That works fine with 2.11.5 so it's not OpenSSH related but Ansible related
Ok. That's good to know at least. Hopefully it gets resolved in Ansible before you do another upgrade of it in image-builder.
No worries 🙂 Glad you managed to figure it out. 😄
📣 Announcement!
I am about to begin the master -> main branch rename as outline in this issue.
If all goes smoothly nothing should be effected as PRs and git references should update automatically. If you notice any problems please let me know in the thread. 🙂
I'll announce again once complete so people can update their own checkout out repos / forks if they choose.
To update your local checkout out copy:
git branch -m master main
git fetch origin
git branch -u origin/main main
git remote set-head origin -a
😒 The test-infra PR deciding to re-run all the checks again after removing the hold label is not ideal.
Held PRs now merged.
Just waiting on confirming that CI jobs still work as expected. (See )
Both PR and periodic jobs are able to run correctly! 🙂
Rename complete! ✅ 🎉
If you experience any issues you believe is related to the rename please don't hesitate to let us know. 🙂 If you'd like to update your local copy there's commands in this thread you can run.
I've opened a PR in test-infra to allow image-builder maintainers to re-run our periodic jobs via Prow -
@mboersma Did you get anywhere with the Netlify config update? The book is still working (as I expected) but I'm not sure about new changes being built and deployed. I've opened a cleanup PR that we can use to test the change when we're ready to do so.
Already done 🙂 And confirmed it worked! 😄
Oh, just realised you sent this before I did the PR 😆 My notifications don't seem to be working right.
Hey y'all, I'd like to get peoples thoughts on using rolling-version base images (within a set major version) and if people think this is appropriate or not for image-builder?
The question comes from this PR and originally this comment. The RockyLinux links we're currently using in Nunatix are dead but have been moved elsewhere. The PR is suggesting changing them to point to the latest of the major release which would mean we don't need to keep updating the value but it does mean that re-runs of the same image-builder configuration could result in different images being built. What does everyone think?
Actually Image-builder inside his OS task is performing yum update / apt update during each build.
so each result is different with always the last security update.
Using the rolling-version as source will give the exact final result and even better will improve the build time because no need to download package twice.
so i approved :-D
Yeah, that is actually a really good point. I'm not sure we actually do this with other OSs (e.g. Ubuntu) but we definitely do with RHEL.
same with ubuntu
- name: perform a dist-upgrade
apt:
force_apt_get: True
update_cache: True
upgrade: dist
register: apt_lock_status
until: apt_lock_status is not failed
retries: 5
delay: 10
Lets leave it a little just in case anyone wants to weigh in but if not I'm happy to approve that PR 🙂
security first, and also sustainable IT improvement 😎
…. and i don’t like wait
The other downside I see to using the lastest base image is we can't use checksum to ensure the image is as expected. But was we're not currently doing that for Nutanix anyway I guess it doesn't matter.
was thinking to implement external checksum file support for this kind of case
I'm ok with using latest, because we already don't have 100% recreatable builds, since Ubuntu and others can already update underneath us (and I was planning to use "latest" for Mariner Linux support).
The tradeoff is obviously that a given CI run might break, but we already see that occasionally (usually from an old distro rusting too much, like Rocky Linux here).
Skipping the office hours this week due to low attendance and nothing pressing to discuss. 🙂
Could I please get a review on this small fix for Mac M1/2 users? 🙏
https://github.com/kubernetes-sigs/image-builder/pull/1215
@Marcus Noble thanks for that!
Maybe we should do an image-builder release soon? Several good fixes, plus Mariner Linux and maybe updating to latest packer, seems like a good milestone to release.
👍 Sounds good to me.
Anything else in the open PRs that'd be good to get in?
I think we're at a good point to do a v0.1.17 image-builder release. I can kick that off this afternoon or tomorrow if we have consensus.
Folks, I'm trying to set cdromtype to sata as a packer variable. I have set this is in a file which is passed to PACKERVAR_FILES . However when the packer vm comes up, the vm comes up with cdrom set as ide. Any suggestions on what i might be missing
Hey Rahav, cdromtype is not used in any of the builders so I guess it just defaults to ide . Hence passing the type has no effect. I suggest adding cdromtype var to the builders in packer-node.json , and pass the type from the from the os configs. For Photon-5 you might need sata I guess.
I could not understand the usage of image-builder Can't we use any ami with cluster-api ?
image-builder is designed to set a bunch of sensible defaults as well as install the tools required for it to work with capi.
On top of that it supports multiple clouds, multiple distros and can be configured to install additional container images, nvidia drivers and more.
A base ami, or any other image from any other provider would require all of the bits the image builder project adds to be done manually.
It just takes the toil out of building images really.
Thanks @Drew Hudson-Viles. So we just cannot use any ami with cluster-api , right?
Well you could, but you'd need to install things on top of it, like kubelets, containerd (or alternative) etc.
capi isn't something you can just install and get all those things with it 🙂
@Drew Hudson-Viles Ok. Which means Cluster API does not install kubeadm and necessary applications and tools by itself. Like in the case of Kubespray.
Yeah that's right. The best way to think about it is that Cluster API is just a collection of APIs that allow you to manage multiple clusters from a single management cluster - that could be a full blown one or kind that you run locally.
@Drew Hudson-Viles We can only use ami build for cluster-api Reading through the image-builder book and it seems it is not an easy route to build custom ami. As an example, i need to build aarch64 based ami as we are using AWS Graviton processor based instances, and there seems to be no way to build the required ami with image-builder at the moment. Can you advise some alternatives. Any documentation what needs to be provisioned for ami to be compatible with cluster-api , then I can probably create an ami manually for the time being.
I can't say for certain as I've not used graviton to date and I'm out of AWS at the moment so can't do any testing.
You may be able to modify this to get it to work with graviton based AMIs but without any testing I cannot confirm for sure. Depending on the package manager it uses etc it may be required you use a different one.
If you do want to do it manually for now then there is a decent amount of documentation available. I'd recommend taking a look through the CAPI book and the cluster-api-provider repo
In the AWS CAPI provider readme there is information about using pre-baked AMIs which I'd suspect Amazon already provide to save any leg work on your side.
The final option I'd say is to read through the Ansible playbooks in the image builder repo and see what's being done at each stage and attempt to replicate that as part of the graviton AMI.
@Drew Hudson-Viles Your thoughts align. Thanks. The only issue is in image-builder the amd64/x86_64 is hardcoded. Let me do a little brainstorming and will let you know the results.
There's actually a PR I've just approved to add that support:
Once that's merged in you should be able to by setting the kubernetes_goarch var. I think you also need to change the base image you use too. I'm not sure as I haven't done it myself.
If you add kubernetes_goarch to your user provided packer vars then it’ll use the value you provide rather than the default. (See here for details on how if you’re not sure: https://image-builder.sigs.k8s.io/capi/capi.html#customization)
@Marcus Noble What about other hard coded values? There are a lot of places where am64/x86_64 is hard coded.
Sorry, I didn’t realise there was still more outstanding work. Looks like we have this issue open to track it: https://github.com/kubernetes-sigs/image-builder/issues/936
Can we have prebuilt image for Amazon Linux 2023 with Kubernetes v1.27.3. Tried building one with image-builder but epel package install failed, since AL 2023 does not allow the same.
What do you mean? What provider are you building for? What vars etc?
Hello. In the OVF file of Flatcar image for CAPV, variables for userconfig are missing. Is it intentional? While testing images via UI, they are very useful.
Here is the related place. I expect to see these variables there.
Here is the related part in flatcarproductionvmware_ova.ovf
Flatcar Container Linux Virtual Appliance
ovf:key="guestinfo.hostname" ovf:value="">
Hostname
ovf:key="guestinfo.ignition.config.data" ovf:value="">
Inline Ignition config or coreos-cloudinit data (cloud-config or script)
ovf:key="guestinfo.ignition.config.data.encoding" ovf:value="">
Encoding for Ignition config or coreos-cloudinit data (e.g., base64)
ovf:key="guestinfo.ignition.config.url" ovf:value="">
URL to Ignition config or coreos-cloudinit data (cloud-config or script)
ovf:key="guestinfo.dns.server.0" ovf:value="">
Primary DNS (only for coreos-cloudinit)
ovf:key="guestinfo.dns.server.1" ovf:value="">
Secondary DNS (only for coreos-cloudinit)
ovf:key="guestinfo.interface.0.name" ovf:value="">
Name for network interface 0 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.0.mac" ovf:value="">
MAC for network interface 0 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.0.dhcp" ovf:value="no">
DHCP support for network interface 0 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.0.role" ovf:value="public">
Role for network interface 0 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.0.ip.0.address" ovf:value="">
Main IP for network interface 0 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.0.ip.1.address" ovf:value="">
Additional IP for network interface 0 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.0.route.0.gateway" ovf:value="">
Main route gateway for network interface 0 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.0.route.0.destination" ovf:value="">
Main route destination for network interface 0 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.0.route.1.gateway" ovf:value="">
Additional route gateway for network interface 0 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.0.route.1.destination" ovf:value="">
Additional route destination for network interface 0 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.1.name" ovf:value="">
Name for network interface 1 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.1.mac" ovf:value="">
MAC for network interface 1 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.1.dhcp" ovf:value="no">
DHCP support for network interface 1 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.1.role" ovf:value="private">
Role for network interface 1 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.1.ip.0.address" ovf:value="">
Main IP for network interface 1 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.1.route.0.gateway" ovf:value="">
Main route gateway for network interface 1 (only for coreos-cloudinit)
ovf:key="guestinfo.interface.1.route.0.destination" ovf:value="">
Main route destination for network interface 1 (only for coreos-cloudinit)
In Kube 1.27, the in-tree kubelet credential provider for AWS was removed (). This followed GA of the external kubelet credential provider feature in 1.26.
At my organisation we pull most of our images from private ECR repos, so since this removal in 1.27, we need the external ecr-credential-provider binary in our CAPV OVAs.
What are the maintainers thoughts on including external credential provider binaries (such as ecr-credential-provider) in published images, such as the OVAs distributed at ?
I need to double check, but I don't think CAPA have included it yet either.
I've opened an issue to track removing old end of life OSs from image-builder defaults:
Hi Folks, I was trying build-node-ova-vsphere-photon-5 target to build ova with a different photon iso but the script is stuck at step vsphere-iso.vsphere: Waiting for SSH to become available... . When tried with the higher verbosity using FOREGROUND=1 PACKERLOG=1 , I get below in the output. I am running scripts in Ubuntu 20.04.5 LTS machine and using vSphere7 as the hypervisor.
Not sure how packer works but I could see the sshusername and ssh_password are already defined in images/capi/packaer/ova/packer-common.json. Any inputs?
==> vsphere-iso.vsphere: Waiting for SSH to become available...
2023/07/26 14:16:13 packer-builder-vsphere-iso plugin: [INFO] Attempting SSH connection to 10.xx.xx.xx:22...
2023/07/26 14:16:13 packer-builder-vsphere-iso plugin: [DEBUG] reconnecting to TCP connection for SSH
2023/07/26 14:16:13 packer-builder-vsphere-iso plugin: [DEBUG] handshaking with SSH
2023/07/26 14:16:17 packer-builder-vsphere-iso plugin: Keyboard interactive challenge:
2023/07/26 14:16:17 packer-builder-vsphere-iso plugin: -- User:
2023/07/26 14:16:17 packer-builder-vsphere-iso plugin: -- Instructions:
2023/07/26 14:16:17 packer-builder-vsphere-iso plugin: -- Question 1: Password:
2023/07/26 14:16:19 packer-builder-vsphere-iso plugin: [DEBUG] SSH handshake err: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none password keyboard-interactive], no supported methods remain
2023/07/26 14:16:19 packer-builder-vsphere-iso plugin: [DEBUG] Detected authentication error. Increasing handshake attempts.
2023/07/26 14:16:26 packer-builder-vsphere-iso plugin: [INFO] Attempting SSH connection to 10.xx.xx.xx:22...
2023/07/26 14:16:26 packer-builder-vsphere-iso plugin: [DEBUG] reconnecting to TCP connection for SSH
2023/07/26 14:16:26 packer-builder-vsphere-iso plugin: [DEBUG] handshaking with SSH
What file specifically are you referring to? I don’t know of any spdx files but you can find the Dockerfile here: https://github.com/kubernetes-sigs/image-builder/blob/main/images/capi/Dockerfile it might give you some hints.
spdx is a format for SBOMs (software bill of materials) so it would be metadata about contents and licenses of package inside a container image
Does this raise any concerns for the usage of Packer in image-builder?
https://discuss.hashicorp.com/t/hashicorp-projects-changing-license-to-business-source-license-v1-1/57106
https://www.hashicorp.com/blog/hashicorp-adopts-business-source-license
Somewhat, yes.
I don’t understand enough to know if it will cause problems for image-builder but I suspect it could do.
I’ve added it as an agenda item to the next office hours meeting on Monday.
I’m AFK until Monday. If someone else has the time before then would you mind opening an issue to track this?
Sure, I can open an issue. Thanks Marcus!
Thank you @Marcus Noble and @mboersma! I shall keep an eye on the issue
If the restriction is on using Packer directly, could one atleast import packer plugins and SDKs as a Go dependency to build one's own CLI or is that out of bounds too?
I haven’t looked. I’m not sure what, if any, libraries there are and I’d they provide enough to replace what we currently do. It’s also possible the libraries are under the new license too as not all has remained under the old license.
Just took a quick look, as far as I see ask the Packer code is now the BUSL license. 😔
Opened license exception request:
@Drew Hudson-Viles Do you have an example of how one might make use of loadadditionalcomponents? I'm trying to see if I can make use of it in our current builder pipelines at Giant Swarm without too much changes but I'm not sure 😬
I believe I do, one tick I'll have a look though my various testing I did.
Just need to flick my PC on as they are all on there. give me 2-3 minutes
So you have to enable the 'additional component' as well as the the role itself with a couple of params.
For example, I have this line to add additional container images into an image - I've omitted the many, many additional ones to make it more readable 😄
"ansible_user_vars": "load_additional_components=true additional_registry_images=true additional_registry_images_list=docker.io/k8scloudprovider/openstack-cloud-controller-manager:v1.25.0,docker.io/k8scloudprovider/cinder-csi-plugin:v1.25.0,k8s.gcr.io/sig-storage/csi-attacher:v3.4.0,install_falco=true install_trivy=true"
Nice! 😄 That's actually much easier than I thought. We'd want to make use of additionalexecutableslist and additionalexecutablesdestinationpath then 🙂
Edit: Oh and additionalexecutables=true
🤔 It'd be nice if we could specify different destinations for different executables but as we currently only need 1 we can work with that I think
Yeah that's exactly it.
I was thinking that too when I played with the additional executables but we'd have to consider how that'd work in terms of providing parameters. It'd be dirty but we could do a destination list where each list item uses the same index as the executable list.... but it's dirty and prone to error 😄
Could format it similar to the volume arg with docker - e.g. ${downloadurl}:${targetpath},....
It'd also be nice to be able to do things like checksum validation etc. but that gets complicated very quickly 😆
oh! 🤦♂️
We need to also unpack the downloaded tar. UGH!
How do other people go about installing additional agents / executables on their images created with image-builder?
aaah yeah I hit that problem... We do need to think about how we support that process but again it's a complex one due to all tars being packaged differently 😞
I don't have a solution to that one right now.
In my ansible playbook where I encounter that same challenge for a list of binaries I'm injecting, I use an optional parameter like: extra_opts: --strip-components=2 (so far I've actually only used it for --strip-components :P)
You're not doing that with image-builder currently though right? As far as I know there's no way of passing in extra archives to extract anywhere? You're referring to the unarchive action not creating a subdirectory, yeah?
Correct, that's not in image-builder's task. But in one of my own ansible playbooks:
ansible.builtin.unarchive:But since these additional_components vars are just a list of strings passed through packer to ansible, it gets more complicated where we'd have to make it accept complex patterns
src: "{{ item.url }}"
dest: "{{ archive.path }}"
remote_src: yes
extra_opts: "{{ item.extra_opts | default(omit) }}"
👍 Thought so, just wanted to double check I wasn't missing something 🙂 Thank you
I guess
Yeah. I think it needs designing first in an issue and agree on what would be needed. This is needed by another team at my company so it might be I get them to contribute it to image-builder if they don't find another solution. 🙂
You could also just add a custom-role in image-builder and have that do the download/extraction/installation
We currently make use of the container image of image-builder in a Tekton pipeline to build our images. Adding in a custom role is just as complicated for us currently 😆
You'd have to mount the custom roles in your container and then reference it; that's about it?
🤔 hmm.... actually, that might be straight forward. We could just store the roles in a repo somewhere and have them checked out to a shared workspace in our pipeline.
If you go that route, share what you did. Always curious to see how people solve these kind of customizations 😄
I still think there might be a use case for including the ability in image-builder directly but would need to think about how the interface for it looks as it could be really messy. I know people have asked in the past about installing security related agents into images.
Wouldn't most of those kinds of agents need config anyway, and then it would already deserve a custom role to cover the whole thing?
Yeah that's what I'm thinking, it gets too messy and I'm not sure if there's an easy way to handle that kind of thing.
Perhaps we should just document the 'workaround'; though you can't add comments in Packer's json files (reason to migrate to hcl 😇)
reason to migrate to hclThere's an issue for that:
I'm going to be (hopefully) joining from my phone as I'll be out but I might be a few minutes late.
@Zach Wachtel has joined the channel
I am liking that we're seeing a good number of new contributors with each release! 😄 💙
@salisbury_joe has joined the channel
📣 PR tests are currently failing due to this:
Please bare with us while we get this fixed. Until then you can assume that failures do to Nutanix are ok. Once we have the fix in place PR tests can be re-run.
Hi @Marcus Noble i let some comment on the issue, open to discuss if needed
If you're able to handle the requirements needed for v0.8.0 that'd be awesome. I don't know anything about Nutanix so went for pinning the version to what was previously passing for now.
yes i can do it
merge your pinning if you want and i will overwrite it in the next PR with the fix tommorow
Perfect! 😄 Thank you!
Hi folks, would anyone be willing to co-sponsor @Dimple Raja Vamsi Kosaraju’s membership of k8s-sigs, who has been contributing a fair amount to image builder of late?
thank-you-very-much @naadir and @jsturtevant for sponsoring, will go ahead and raise a Issue for joining the k8s-sigs org
Raised Issue#4408 🙂
I won't be able to make today's unfortunately as I'm out with the family for the bank holiday.
I have a few topics but I suspect that today might be quite as I know Matt is also out today. None of my topics are urgent, mostly just updates, so I’ll postpone if no one else ends up joining.
The meeting notes (above) have been updated with what was discussed today.
Main discussion points:
Hello folks, is there a way to run image-builder in an airgapped environment? AFAIK Internet access is required for:
Hi! 👋
I think you're going to struggle to do this with the repo to be honest. I think if you really wanted to do this you'd have to fork the repo into your environment and make some pretty significant changes to it to get it to use your own package repo, images etc. But then you risk it falling out of line with everything else.
I'm not sure if anyone else has experience with this and can help further, but as the internet is required in its current form I'm not sure how to recommend proceeding with this.
As Drew said, the answer is pretty much “no”. There’s too much external dependencies that would need to be replicated locally it wouldn’t be worth it. It would be less work to build a custom script to handle your specific use case.
I’m curious as to way building of the images must be done in an air gapped environment. Could you explain more your use case?
Thanks for your insights Drew and Marcus! We are trying to get Image-builder working offline with Artifactory repos for debs/RPMs, local registry for container images, local mirrors for binaries/executables pre-downloaded from github. Wanted to know if that's even a feasible path
Well of course you technically can use your own mirrors for everything, it'll just be a lot of work to keep your changes working with every new image-builder release I think.
I'd start looking at creating a custom role and referencing it in as a pre ansible role. In there you'd have to change configuration of all relevant repositories.
But that won't cover other tasks in the existing roles that reference their own sources, so you're going to have to go through all these other roles to see if you can override sources that are in use in a reproducible way.
It is possible and we do it to build images in airgapped envs. We do it for Photon & Ubuntu. As long as you have internal mirror repos for OS pkgs, Internally built/hosted k8s & friends pkgs. Let me know if you run into any specific issues.
Thank you all for your valuable inputs. I'll reach out if I face any issues.
Historically, a few years ago we ran image builder in an AWS environment that was isolated from the Internet, unless there's been significant changes in external dependencies stuff should have had parameters for pointing to an internal equivalent
@kiran keshavamurthy qq, can snap commands pull from private snap store proxy or other internal hosted methods? I was looking at this task and was wondering if we can provide a source for it.
What framework is used for tests within the image-builder repo? I have a need to set up testing of an OVA that we're building through Packer, and I want to see how other projects approach this.
No frameworks, it's all custom bash scripts. This is also the reason that tests aren't as good as we'd like, there's a lot of functionality we haven't got tested because it's a lot of work and not clear to contributors how to add more.
If you come across some nice frameworks for this we'd be very interested
But I'll be honest, I'm far from an expert on this area so maybe some of the other contributors can comment more 🙂
Yeah I did notice goss, looks like you combine it with the build itself though, and I want to test after a deployment of a completed ova 🙂
Yeah, I think it's used as validation during the build so if it doesn't match what's expected we fail the build rather than continuing to the end.
I've used inspec in the same manner, but it doesn't cover testing deployment scenario's, so looking for something more elaborate.
goss is used to validate the image is created, in the image-builder CI, if the image builds properly and passes goss tests, we call it good.
Before we produce images for CAPZ, we do deploy the image in a cluster, to validate the cluster comes up.
Once it's a released image we have CI in CAPZ that runs kuberentes conformance tests a few times a day so we would catch any major regression there.
Other providers like AMI and GCE also run e2e tests in CI after images are produced so would catch regressions (we had one last week in GCE)
Same has what James said. We deploy k8s clusters with the OVAs and run install/upgrade/scaling/conformance etc tests. We have a internal testing framework in python to run other tests that’s needed.
Hello. Is there anyone who builds OVA images for CAPV inside a container/pod? How do you install VMware tools to the container image?
Or do you mean you need the tools inside the container while building the image?
I did mean the container that builds the node images. In image-builder project, vmware-iso builder is used. See
It has dependencies:
This VMware Packer builder is able to create VMware virtual machines from an ISO file as a source. It currently supports building virtual machines on hosts running VMware Fusion for OS X, VMware Workstation for Linux and Windows, and VMware Player on Linux. It can also build machines directly on VMware vSphere Hypervisor using SSH as opposed to the vSphere API.
FROM quay.io/giantswarm/capi-image-builder:1.6.8but it also relies on Virtual Infrastructure eXtension (VIX) SDK.
USER root
# Check
RUN wget -q -O /tmp/VMWareWorkstation.bundle <br> && chmod +x /tmp/VMWareWorkstation.bundle <br> && /tmp/VMWareWorkstation.bundle --console --required --eulas-agreed <br> && rm -rf /tmp/**
USER imagebuilder
Can you double check with the latest release of image-builder? If it’s still missing can you please open an issue? 🙂
Unless @kiran keshavamurthy happens to already know?
Just checked and the binaries (vmplayer, vmrun) don’t exist. I couldn’t find any code to install those into the image -builder container image too.
In theory, yes. But there’s a lot to keep track of so things get missed.
@Erkan Erol: I use VMWare Workstation on Linux to create the images with image-builder, since VMWare Player does not work; the command I use is make build-node-ova-local-ubuntu-2204
@Alessandro Giorgio Togna Do you use the Docker image or are you running the code directly?
Yeah, Erkan is looking for how to run it within a container so that we can make use of it in our existing Tekton pipelines that builds our images for other platforms.
I know, but I think you cannot use VMWare Player, you need the full Workstation
I know, but I think you cannot use VMWare Player, you need the full WorkstationI am not fully sure but the packer builder doc says
This VMware Packer builder is able to create VMware virtual machines from an ISO file as a source. It currently supports building virtual machines on hosts running VMware Fusion for OS X, VMware Workstation for Linux and Windows, and VMware Player on Linux. It can also build machines directly on VMware vSphere Hypervisor using SSH as opposed to the vSphere API.
I'll ask in the office hours in 10 min to see if anyone there knows an answer 🙂
packer might be able to do it, but image-builder does not...
image-builder is just an opinionated wrapper around packer. If you know how to do it with Packer we can update this project to support it. (Hopefully)
@Marcus Noble Could you learn anything yesterday? Was there anyone who has an answer?
Sorry, forgot to report back. I’m afraid not. The attendance was very little and none with CAPV experience. 😔 Did you manage to get any response from the CAPV team?
Hello. Is there anyone who builds OVA images for CAPV inside a container/pod? How do you install VMware tools to the container image?When building from the container we recommend using vSphere builders.
Are you doing that with image-builder? I thought we only had the OVA vSphere targets? (I may be misunderstanding though)
From the images/capi directory, run make build-node-ova-It is possible to use a remote vSphere to build images.- , where is your target hypervisor (local or vsphere) and is the desired operating system. The available choices are listed via make help.
There are several targets not being tested due to not having infrastructure to test them on. Its a known issue and something we're hoping to improve but without access to the relevant cloud infrastructure there's not much we're able to do other than rely on contributors and users to test and report issues. 😞
Update: make build-node-ova-vsphere-flatcar worked in my first attempt. Many people reported they use vSphere here. It seems it is a defacto standard.
Now that we've merged the pkgs.k8s.io change, I think we should do an image-builder v0.1.19 release (before we update to containerd 1.7). I'm happy to do a release today unless there are reasons to wait or other objections.
I meant to ask this in the last office hours and forgot:
The CFP for the Kubernetes Contributor Summit in Chicago closes on Friday. Do we want to submit a session for image-builder at all?
Would anyone be interested in either:
I love the idea but I doubt I could swing it to get to Chicago just yet 😛. I was going to try and get something for KCD in London (which I see you're at) but too much on my plate meant I missed the deadline even though my boss pestered me (he's one of the organisers).
All that being said I think a state of the project would be a good one as I do wonder sometimes if some people even know about it. In my previous job we never used it and it would have made life simpler if we'd known about it.
Yeah, it does kinda depend on who will be at KubeCon NA or not as to if it'll be useful.
I probably can't make it to Chicago, but thumbs up to a session, maybe we could participate remotely.
We should add a Breaking Changes section to the release notes to indicate that people might need to change the kubernetesdebversion variable if they're providing it themselves. (see this thread - )
Anyone ever noticed that ntp settings are not picked up properly in Ubuntu2204 vSphere OVA, even though upon deployment guestinfo.userdata does have ntp.enabled: true and ntp.servers: [
i'd say so. but image-builder allows you to add your custom role when building an image. so if you know that all your clusters will have the same ntp config, then you can build it in the image and not rely on cloudinit
Unfortunately that's very much not the case for me, but also, it seems the issue has gone away with a new run of image-builder :)
hmm, was anyone else able to successfully build images with the last release (1.26.9 etc.)?
Im unable to install the debs for the new releases
If GCE, it might be now fixed with this PR that's just merged -
Oh. Hmmm... maybe that has the same problem as GCE then? Are you able to try overwriting the kubernetesdebversion to 1.24.15-1.1 to see if that works?
but yeah, might be idential as the build version is still the old one
Oh actually, I don't think it is that. The GCE one was because it was an override that wasn't updated during the original change. So unless you're providing the kubernetesdebversion variable yourself it should be working unless we've missed something. 😞
vsphere-iso.vsphere: fatal: [default]: FAILED! => {"cache_update_time": 1694680197, "cache_updated": false, "changed": false, "msg": "no available installation candidate for kubelet=1.24.17-00"}Yeah, but i guess since the release yesterday the old -00 postfix needs to be changed and i missed that
Ah yeah. I guess we need to announce that more clearly. I didn't think about people setting the values themselves. 🤦♂️
Sorry, I didn't realize there were overrides of the kubernetesdebversion elsewhere, should have caught that in the PR.
@mboersma it was in our overrides that we have locally to build our images, ive missed that those need migrating
@mboersma I can't seem to get the ensure-ansible-lint.sh working on my Mac 😞
ensurepy3bin ansible-lint keeps failing for me and I'm not sure why. It look like it is installed (as pip3 show ansible-lint works) but I don't have the cli available anywhere 😕
oh actually, looks like my PATH might not be right 🤔
I'm sure it won't be the last time 😄 I did exactly that the other day and wondered why ansible wasn't working when it was clearly installed facepalm-1720
It's been a long day 😅
📣 There's currently no topics on the agenda for todays office hours. If no one adds anything in the next few hours I'm going to suggest that we cancel the sync for this week. 🙂
☝️ Agenda still empty so I'm cancelling the office hours for this week 🙂 See y'all in 2 weeks!
Ok with me, I didn't have anything in particular to discuss. Until next time!
I was wondering how I could build RHCOS images using image builder as Red Hat uses RHCOS for the compute nodes.
We don't currently have any support for RedHat coreos in image-builder. If you'd be willing to work on a PR to introduce it we'd be very welcoming to that. It would require an understanding of Packer to build the images.
I also wasn't aware that RedHat had their own CAPI provider. Or have I misunderstood your use?
Hello @Ashutosh, what is your goal ?
let’s first resume some few points:
Hello @Christophe Jauffret Thank you for your comment. My goal is to automate the creation of new clusters and reduce the human errors that could happen when using the IPI method of installing a cluster. Today with IPI every cluster has to be installed manually and this is error prone. Cluster API will not just make the installation of new clusters less cumbersome but also updating and upgrading these clusters in a standardised way. So my goal is to use RHCOS in a CAPI context. But doesn’t the Openshift IPI automatically provision VMs in Nutanix with RHCO? So under I was wondering why RHCO wasn’t listed although RHEL is listed. My goal was to test this using an openshift cluster installed on Nutanix which will act as the management cluster and then spin up a new cluster using CAPI.
green-light-alert There are a couple of good-first-issue items in image-builder. If you're looking for an easy way to get involved in the project, we would love your help!
They've both been assigned. I'll try to create some more issues we need help with, and if we don't have a PR for the first one soon we can reassign it.
Huge thank you for filling out the SIG survey thank-you
Thanks especially to @Marcus Noble for making sure it got done. 🙂
tl:dr I don't know of another way to do it, but I'm far from a Packer expert, so we should see if other image-builder folks have ideas.
Hello folks, are you guys able to build ubuntu 22.04 for vsphere? I see its "waiting for ssh" on the vSphere VM.. In the console of VM, its either asking for manual inputs starting from selecting keyboard or like one below
<a href='http://registry.k8s.io/scl-image-builder/cluster-node-image-builder-amd64:v0.1.19'>registry.k8s.io/scl-image-builder/cluster-node-image-builder-amd64:v0.1.19</a>.maybe you can share some more details, it's hard to help just knowing it fails 😛
I am trying to rebuild a CAPI image for k8s 1.26.6 using this values:
{
"kubernetessemver": "v1.26.6",
"kubernetesdebversion": "1.26.6-00",
}
however the make is failing since it cannot find kubectl/kubeadm 1.26.6-00.
Where do I get the list of available values for kubernetesdeb_version ?
Try apt-cache madison kubeadm on a system where the repo available. This will yield the versions available.
if I do that the 1.26.6-00 version is available, but make fails with a "cannot find the 1.26.6-00" version of the package
for instance, the default version now is:
"kubernetesdebversion": "1.26.7-1.1"
which I cannot find with apt-cache
Are you using the new apt repos on the machine on which you're running apt-cache?
Cool 🙂 I suspect it may be the change in repos which is why you're not seeing it. Let me know if that doesn't work.
thank you, I had the "wrong" repo where all the versions are -00
Glad that's worked 🙂. It was a fairly recent change so could be easily missed.
I could build the Ubuntu 23.04 image, however, looks like cloud-init is not working well... no dhcp, no user created, no kubeadm
I could log into the machine by creating a user by hand, if I run dhclient it gets ip
I could mount the cloud-init disk /dev/vda, and I see it has a openstack dir with everything inside it
We've got the image-builder office hours later today but currently the agenda is empty. I'm not aware of any topics that need discussing and if there's nothing added by 1 hour before the call I plan to cancel it and meet up in a couple weeks for the next sync instead.
Agenda empty. Cancelling office hours 🙂 See y'all in 2 weeks!
I asked this on the PR but prob best to chat here...
Do you happen to know if the E2E environment changes were announced anywhere? Is there something we should be keeping an eye on for potential problems?
It's end of day for me and I'm not in a rush to have it done right now so I might give it a little time (e.g. until Monday) for people to speak up with any changes they want to get in.
Right?! I have a habit of breaking clusters on Friday 😅
I'm going to start the process of getting v0.1.20 released
Image-builder v0.1.20 is now available:
Thanks to all contributors!
41 ~ │ ansible-galaxy -vvv collection install <br> 42 │ community.general <br> 43 │ ansible.posix <br> 44 │ 'ansible.windows:>=1.7.0' <br> 45 │ community.windowscommunity.general is failing for me when running hack/ensure-ansible.sh
Maybe we should open an issue to change the ensure-ansible.sh script to also check the version.
I don't think we want to use the old-galaxy do we? The solution we want for image-builder is version checking with a user-friendly error if found to be less than 2.13.9 right?
Oh sorry, this is an external change that is now causing issues?
I
am using Cluster-API with Kubevirt and Kubeadm... question: Is there
any documentation on what exactly the image needs to look like to work
on Cluster API? I've built a few images with image-builder that are not
working and I am not sure how to troubleshoot.... I am not sure if there is a package missing, a service that was not
started, or a config file... Is there any document about the
requirements on the image?
Hey Marcelo, it might be useful to explain what error you're seeing and what vars / make target you're using, etc.
I don't have any experience with Kubevirt so might not be able to help you myself but might be able to point you in the right direction.
I can't help without the information I asked for.
let's go with the easier one... I've copied ubuntu 22.04 to 23.04, and I am running qemu-ubuntu for it
but kubeadm doesn't bootstrap on it, the control plane doesn't get up
export PACKER_FLAGS="--var 'kubernetes_rpm_version=1.27.6-0' --var 'kubernetes_semver=v1.27.6' --var 'kubernetes_series=v1.27' --var 'kubernetes_deb_version=1.27.6-00' --var 'disk-size=6144'"
Oh right, so you're trying to build a new OS that we don't yet support?
I think that is the easy one... I've used the same flags, to build-qemu-flatcar
I can troubleshoot both... those are the ones that I've been trying so far
Gotcha!
Ok, so hard to say what could be going on. If you haven't done so already I recommend taking a look at the boot log to see if there are any failures that stand out in there (e.g. failure to pull something from the internet or start a service). If that all looks ok I'd suggest taking a look at journalctl -f and look out for services (specifically kubeadm related) that are failing to start.
Did the build log output have any warning that might be related?
my best clue, when it comes to ubuntu 23.04 is that it might have something to do with the fact that the NIC name changes... its not in enps, its like ens**
and I see there is a netplan file the image-build uploads over there, that I think might be related to the problem
Oh yeah that could possibly be the problem. Have you tried updating this file? https://github.com/kubernetes-sigs/image-builder/blob/c70b22baba58b89cf5d6561bc8e2[…]nsible/roles/sysprep/files/etc/netplan/51-kubevirt-netplan.yaml
my system is a little bit complex, but I'm importing the image right now into my KubeVirt Cluster to test
🤔 Need to have a think about how best to handle this for different OS's
gotta figure out what went wrong in the build process, to have it fixed
but I think the trick was done by running sudo cloud-init clean --machine-id
I might have done a lot of mess over here, but its working
so, for now, I have a guess it has something to do with the cloud-init clean, and not the netplan file
cause all my other images work, and they don't have the netplan file, as I'm building as a plain simple qemu image
👍 I'm about so can join and see if there's anything to discuss 🙂
when building 1.28.2 image based on v0.1.20 release, I meet:
amazon-ebs.{{user build_name}}: TASK [include_role : kubernetes] *
amazon-ebs.{{user build_name}}:
amazon-ebs.{{user build_name}}: TASK [kubernetes : Add the Kubernetes repo key] *
amazon-ebs.{{user build_name}}: changed: [default]
amazon-ebs.{{user build_name}}:
amazon-ebs.{{user build_name}}: TASK [kubernetes : Add the Kubernetes repo] *
amazon-ebs.{{user build_name}}: changed: [default]
amazon-ebs.{{user build_name}}:
amazon-ebs.{{user build_name}}: TASK [kubernetes : Install Kubernetes] *
amazon-ebs.{{user build_name}}: fatal: [default]: FAILED! => {"changed": false, "msg": "No package matching 'kubelet' is available"}Any idea? Should I change some registry source or something, or downgrade something? Thanks!
my config file:
/Users/yikew/Working/capa/image-builder/1.28.2/config.json
{
"kubernetes_series": "1.28",
"kubernetes_semver": "v1.28.2",
"kubernetes_rpm_version": "1.28.2-0",
"kubernetes_deb_version": "1.28.2-00",
"kubernetes_source_type": "pkg",
"kubernetes_http_source": "",
"kubernetes_rpm_repo": "",
"kubernetes_rpm_gpg_key": "\" \"",
"kubernetes_rpm_gpg_check": "True",
"kubernetes_deb_repo": "\" kubernetes-xenial\"",
"kubernetes_deb_gpg_key": "",
"kubernetes_container_registry": "registry.k8s.io",
"kubernetes_load_additional_imgs": "false",
"kubeadm_template": "etc/kubeadm.yml",
"containerd_version": "1.7.6",
"containerd_sha256": "20da1f2252d2033594b06e1eb68dd4906ff439f83f1003b7ebacdffcb4b95bdc"
}
it happens when building build-ami-ubuntu-2004 and build-ami-ubuntu-2204
build-ami-centos-7 and build-ami-amazon-2 work fine.
downgrade image-builder to v0.1.17 and then it works on build-ami-ubuntu-2004 and build-ami-ubuntu-2204 finally.
@Yike Wang I think this is because the format of kubernetesdebversion and kubernetesrpmversion changed slightly in .
The packages now come from the approved pkgs.k8s.io repository, but Kubernetes is using new tooling to publish. So I think this will work with image-builder v0.1.20:
"kubernetes_rpm_version": "1.28.2",
"kubernetes_deb_version": "1.28.2-1.1",
I follow the new configs in . , but I'll hit:
amazon-ebs.{{user build_name}}: fatal: [default]: FAILED! => {"changed": false, "msg": "Failed to download key at : HTTP Error 403: Forbidden"}my config:{
"kubernetes_series": "1.28",
"kubernetes_semver": "v1.28.3",
"kubernetes_rpm_version": "1.28.3",
"kubernetes_deb_version": "1.28.3-1.1",
"kubernetes_source_type": "pkg",
"kubernetes_http_source": "",
"kubernetes_rpm_repo": " user kubernetes_series }}/rpm/",
"kubernetes_rpm_gpg_key": " user kubernetes_series }}/rpm/repodata/repomd.xml.key",
"kubernetes_rpm_gpg_check": "True",
"kubernetes_deb_repo": " user kubernetes_series }}/deb/",
"kubernetes_deb_gpg_key": " user kubernetes_series }}/deb/Release.key",
"kubernetes_container_registry": "registry.k8s.io",
"kubernetes_load_additional_imgs": "false",
"kubeadm_template": "etc/kubeadm.yml",
"containerd_version": "1.7.6",
"containerd_sha256": "20da1f2252d2033594b06e1eb68dd4906ff439f83f1003b7ebacdffcb4b95bdc"
}do you have idea
Try changing the kubernetes_series parameter to v1.28, with the leading v. That should fix it.
curl -ILs -o /dev/null -w "%{http_code}" https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/1.28/deb/Release.key
403
curl -ILs -o /dev/null -w "%{http_code}" https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.28/deb/Release.key
200
I always use 1.28 as kubernetesseries without problem, never notice it. Thank you! @Abhay Krishna Arunachalam it works!
But for amazon image noproxy=* make build-ami-amazon-2 , there is no satisfied cri-tools found in the new repositoriesamazon-ebs.{{user *build_name}}: TASK [kubernetes : Install Kubernetes]
amazon-ebs.{{user build_name}}: fatal: [default]: FAILED! => {"changed": false, "changes": {"installed": ["kubelet-1.28.3", "kubeadm-1.28.3", "kubectl-1.28.3", "kubernetes-cni-1.2.0"]}, "msg": "Error: Package: kubeadm-1.28.3-150500.1.1.x86_64 (kubernetes)\n Requires: cri-tools >= 1.28.0\n Available: cri-tools-1.25.0-1.amzn2.0.1.x86_64 (amzn2-core)\n cri-tools = 1.25.0-1.amzn2.0.1\n Available: cri-tools-1.26.1-1.amzn2.0.1.x86_64 (amzn2-core)\n cri-tools = 1.26.1-1.amzn2.0.1\n Available: cri-tools-1.26.1-1.amzn2.0.2.x86_64 (amzn2-core)\n cri-tools = 1.26.1-1.amzn2.0.2\n", "rc": 1, "results": ["Loaded plugins: extras_suggestions, langpacks, priorities, update-motd\n227 packages excluded due to repository priority protections\nResolving Dependencies\n--> Running transaction check\n---> Package kubeadm.x86_64 0:1.28.3-150500.1.1 will be installed\n--> Processing Dependency: cri-tools >= 1.28.0 for package: kubeadm-1.28.3-150500.1.1.x86_64\n---> Package kubectl.x86_64 0:1.28.3-150500.1.1 will be installed\n---> Package kubelet.x86_64 0:1.28.3-150500.1.1 will be installed\n---> Package kubernetes-cni.x86_64 0:1.2.0-150500.2.1 will be installed\n--> Finished Dependency Resolution\n You could try using --skip-broken to work around the problem\n You could try running: rpm -Va --nofiles --nodigest\n"]}Should I file an issue somewhere?
Is there anyone with an understanding / experience with cloud-init in Ubuntu that might be able to help out @Shalin Patel with this issues - ? It looks like something changed in the 23.3.1-0ubuntu1~20.04.1 release of cloud-init that broke builds for cluster-api-provider-aws. 😞
I'm not familiar enough with it to be able to say what's changed to be honest. If I get chance this weekend I'll have a look but it does look like something in the package has changed to cause this since they can downgrade and get it working.
Yeah exactly. But I've also no clue about that package so was hoping to get some insight from the hive mind of Slack 😄
Thank you for bringing it to attention. This issue was reported and observed by us in CAPA. CAPA members are looking into it.
I built an OVA image (flatcar-stable-3602.2.1-kube-v1.24.12) and tried to import it to vSphere. I hit this issue
Issues detected with selected template. Details: - 51:7:VALUE_ILLEGAL: Value ''VirtualSCSI'' of ResourceSubType element not found in [lsilogic, lsilogicsas]. - 94:7:VALUE_ILLEGAL: Value ''3'' of Parent element does not refer to a ref of type DiskControllerReference.
# upstream image
# my image
This is the PR that introduced the change:
@Yiyi Zhou as the author of that PR are you able to offer any insight?
@Erkan Erol Can you confirm what version of vSphere you're working with? If I'm understanding this page correctly it looks like otherLinux64Guest was introduced in v5.0. But I'm just trying to search for related things and don't have much actual insight into vSphere stuff as you know 😅
Yeah otherLinux64Guest is introduced since 5.0. If this change is breaking backward compatibility, I will file to revert.
The version of vSphere I use is 7.0.3 and it gives the weird error I mentioned above for the images that contains otherLinux64Guest. Interesting. @Yiyi Zhou Have you ever tried to upload to vSphere an image after this change? What could I be missing?
Weird 😕 I guess if the linux-64 one works I think you can set vsphereguestos_type in your provided vars to override it. I haven't actually checked through the code but that PR seems to suggest that would work. 🙂
Actually, let me double check to be sure. 1 min.
I didn’t see anything in the python script that respects this variable.
Which python script? It looks to be passed to Packer as guestostype in packer-node.json
Yeah just looking at that and I'm now even more confused 😕 Do you know where that builddata comes from? For Flatcar I would have expected it to be other3xLinux64Guest based on the link I posted above but there's no key in the map for that so I would expect it to end up with an empty string for the OSTYPE
Yeah not sure how to handle that. Any thoughts?
I am trying to understand why the image doesn’t work for me when my vSphere is 7.0.3. Maybe I am missing something.
To support both cases, we need to extend images/capi/hack/image-build-ova.py a little bit. @Yiyi Zhou What do you think?
Hi again. I checked this issue in detail again. As far as I understand, the problem is flatcar specific. Flatcar should be mapped to other3xLinux64Guest instead of otherLinux64Guest
"guest_os_type": "{{user vsphere_guest_os_type}}",
OS_id_map = {"vmware-photon-64": {"id": "36", "version": "", "type": "vmwarePhoton64Guest"},
"centos7-64": {"id": "107", "version": "7", "type": "centos7_64Guest"},
"centos8-64": {"id": "107", "version": "8", "type": "centos8_64Guest"},
"rhel7-64": {"id": "80", "version": "7", "type": "rhel7_64Guest"},
"rhel8-64": {"id": "80", "version": "8", "type": "rhel8_64Guest"},
"rockylinux-64": {"id": "80", "version": "", "type": "rockylinux_64Guest"},
"ubuntu-64": {"id": "94", "version": "", "type": "ubuntu64Guest"},
"flatcar-64": {"id": "100", "version": "", "type": "otherLinux64Guest"},
"Windows2019Server-64": {"id": "112", "version": "", "type": "windows2019srv_64Guest"}}
I think it’d be good to eventually refactor that script so that we don’t have those hardcoded values if possible. @Erkan Erol any chance you could open a basic issue for us to look at that in the future?
I just opened an issue I have some urgent things for my daily job so I don’t have time for this issue now. I can check it later.
Just having the issue for now is a big improvement. Thanks for taking the time 😁
Cross posting since this seems related to image builder:
There was a post a little way up about this, and a linked issue:
🙈 Which I see linked to your comment in the thread
@voor Would you mind updating with the latest outcome from your thread to include the things tried and ruled out? Would hate for it to get lost in the noise of Slack.
Hi folks. Is there a provision by which I can build a RHEL kubernetes image for vsphere but instead of using an rhel iso as base, I can use a different format something like a vmdk?
The agenda is empty and several people aren’t able to make it so let’s skip. 🙂
Kubecon week is always a tough time for community meetings, I agree let's skip until Nov. 20th.
I’m currently sat in the contributor summit so I’ll try and convince more people to join us on image-builder! 😄
Hi folks, Sorry for a follow up on this, any help would be appreciated.
Hello
Im trying to build and use an image for clusterApi to use to create k8s clusters with out direct internet connection. So Im using Nexus as a proxy. I have successfully created a image with version 1.28.3.
My challenge is that when I generate a cluster.yml for kubectl to apply the cluster that clusterApi creates still is trying to connect to registry.k8s.io. And that will not work in my setup. I cant seem to find where to change this kubeadm config yaml to use my nexus proxy.
Have you tried setting the field .spec.kubeadmConfigSpec.clusterConfiguration.imageRepository to your private registry?
Yes I have configured the ansible template to point to my own repo. It is somehow sadly ignored or later overwritten
I’m currently travelling and on mobile so not got a link to hand but there was a very similar question a couple months ago in this channel that you might be able to find. Hopefully it contains the answer you need but I’m not 100% sure.
I think this is the thread Marcus is referring to.
https://kubernetes.slack.com/archives/C01E0Q35A8J/p1678141651934979
I'm going to get a new release published so the latest fixes for vsphere are available to use. Speak up now if there's something you're also wanting to get included that isn't yet merged.
Kicking off the process now. If there's some other changes we need we can always do another release 🙂
@mboersma I'm confused slightly about one step of the release process:
Hey @Marcus Noble, sorry I was out for a few days so didn't see this thread until now.
Yes, I don't see the jobs in testgrid either, although I've learned that the images will show up in staging anyway, I know the jobs did show up at one point after we nailed down the release process...I'm not sure what broke.
The images are built as expected, I just cant find any logs / status showing them being built 🤷
Image-builder v0.1.21 is now available:
Thanks to all contributors! 💙 🎉
The image-builder office hours are due later today but there is currently only one item on the agenda (the announcement of the new release just above this post). If you have any topics you'd like to discuss please add them to the agenda. If it's still empty by the time the automated reminder message gets posted in here I'll conclude we have nothing to discuss this week and cancel it for today. 🙂
☝️ Agenda still empty so I'm going to cancel for today! 🙂
I guess the published node images don't have passwords, ssh keys and its expected to be used with cluster-api, infra-providers which set ups the required credentials for ssh access.
☝️ The agenda is currently empty. Does anyone have anything they're like to discuss? If not I'm happy to skip but it's likely to be the last sync until next year.
I don't have any specific topics to discuss, but I can join if something gets added.
Still no topics so I'm going to skip.
Happy holidays and hope you all enjoy the festive period! 🎉
I'm playing nurse to the family today so may not be able to join anyway I'm afraid! I've got some potential updates to nvidia bits incoming soon™ but that can wait until next year
I've also realised that since the DST change that reminder in here is an hour earlier than it should be 🙈
set up a reminder “Image-Builder office hours start in 1 hour. Agenda: https://docs.google.com/document/d/1YIOD0Nnid_0h6rKlDxcbfJaoIRNO6mQd9Or5vKRNxaU/edit” in this channel at 2:30PM every other Monday (next occurrence is December 11th), Greenwich Mean Time.
set up a reminder “Image-Builder office hours start in 1 hour. Agenda: https://docs.google.com/document/d/1YIOD0Nnid_0h6rKlDxcbfJaoIRNO6mQd9Or5vKRNxaU/edit” in this channel at 3:30PM every other Monday (next occurrence is December 11th), Greenwich Mean Time.
🤦♂️ I now don't know how to get it to start the reminder in two weeks
set up a reminder “from 18th December at 3:30pm Image-Builder office hours start in 1 hour. Agenda: https://docs.google.com/document/d/1YIOD0Nnid_0h6rKlDxcbfJaoIRNO6mQd9Or5vKRNxaU/edit” in this channel at 9AM every Monday, Greenwich Mean Time.
Nevermind, I'm going to set myself a reminder to create a new channel reminder in the new year 😆
⚠️ I've opend a PR to remove the docker implementation related code for Windows. It was used with dockershim which was removed in 1.24 in k/k and relied on a bug in Docker Windows implementation for the support. This means containerd will be default (and only supported runtime for windows) in image builds. I believe most Windows implementations were already using containerd so should not be an issue
Hi all. If I am using a fresh ubuntu VHD, what else should I add to ensure that it can be used within an azure cluster?
Hello Folks, v0.1.21 does not remove machine ID (/etc/machine-id) while building vsphere template(build-node-ova-vsphere-ubuntu-2204).. Anyone facing similar issue, or am I doing something wrong? This is causing all VMs being assigned same IP address. Anyone else facing same issue. I can see in build logs that it shows truncated, but it really isn't
vsphere-iso.vsphere: TASK [sysprep : Truncate machine id] *
2251 vsphere-iso.vsphere: changed: [default] => (item={'path': '/etc/machine-id', 'state': 'absent', 'mode': '0644'})
2252 vsphere-iso.vsphere: changed: [default] => (item={'path': '/etc/machine-id', 'state': 'touch', 'mode': '0644'})
2253 vsphere-iso.vsphere:
2254 vsphere-iso.vsphere: TASK [sysprep : Truncate hostname file] *
2255 vsphere-iso.vsphere: changed: [default] => (item={'path': '/etc/hostname', 'state': 'absent', 'mode': '0644'})
2256 vsphere-iso.vsphere: changed: [default] => (item={'path': '/etc/hostname', 'state': 'touch', 'mode': '0644'})
confirmed that building template using image-builder version v0.1.19 works fine.. v0.1.21 has issues
Sure, running some more builds to see where it is going wrong. I will raise an issue after.
If you have something image-builder to discuss, or any question to ask, please add it to the agenda above (or just mention it here in this Slack thread.)
If we don't have any topics, we'll skip until 2024. 🙂
Yes probably @Drew Hudson-Viles but I'm available if anyone has something to discuss.
I suspect a number of people may have already broken up for the break.
I'm also available too. I appear to be one of the few people in my company still working this week 😄
I think you're right @Drew Hudson-Viles; looks like we should skip this meeting.
But two Mondays from now is New Year's Day... Should we wait three weeks and start up again on January 8th?
Let's do it. I hope you have excellent holidays, see you soon!
I was going to suggest the week after as it's unlikely anything major will crop up... but you always expect that over the holidays and it always happens 😛
So yes, let's stick with the 8th.
Have a good break my friend and I'll speak to you after.
We don't have any agenda topics, so let's skip today's office hours. Because two weeks from now is New Year's Day, we'll have the next one on January 8th, 2024.
Happy Holidays everyone!
I think actually the 15th is when the next meeting is scheduled, not today. Sorry about the misinfo.
set up a reminder “Image-Builder office hours start in 1 hour. Agenda: https://docs.google.com/document/d/1YIOD0Nnid_0h6rKlDxcbfJaoIRNO6mQd9Or5vKRNxaU/edit” in this channel at 3:30PM every other Monday (next occurrence is January 15th), Greenwich Mean Time.
Hello image-builder maintainers,
Recently, I talked to SCL Leads regarding a new proposal for a new sub-project,
A Project for automating image builds using the Kubernetes API (Similar to CAPI but for images)
They mentioned that this is potentially an image-builder evolution.
Therefore, I'm sharing this with you for discussion and brainstorming.
Find my design proposal:
Hey @mcbenjemaa 👋 This is great! I love seeing new ideas like this!
I have some thoughts / questions...
Also, we have the next image-builder office hours on Monday - do you think you'd be able to join and present this there too?
We are collecting thoughts and feedback.
The naming idea comes from a metaphor,
Forging Images
While the Core is an anvil, the infrastructure part is a blacksmith, and the provisioners are tools, like a Hammer.
First of All, as SCL leads stated, this could be an image-builder evolution like an image-builder v2.
However, Rebranding is a proposal whether it's approved or not.
Let me answer some of your questions:
If I'm not mistaken, this would be more a replacement for Packer rather than specifically image-builder, yes? The "Why" in the document seems to conflate the two and I don't think that's quite correct. For example, image-builder provides end users with a known good (ideally) set of configurations for building a VM image for a kubernetes node. With how Forge is describe in the doc, it looks like it would be on the end users to come up with and configure the build steps themselves as it would be the CRs that they apply.This actually means a replacement for image-builder itself, and yes, this will get rid of Packer, but it still uses Ansible playbooks.
Do you have any thoughts on how variables might be used to allow users to configure things like package versions? I suspect this is what you're hinting at with mentions of Helm and Kustomize but it would be could to have that spelled out in the proposal.Well, many solutions could be used, like templating with kustomize or helm,
I'd like to know more about the issues you state people have reported with using image-builder - Image-builder is a bit lazy in updating and upgrading dependencies. and Users also have asked different questions about the usage.. I'm not aware of specifically what these are referring too so if you have any sources etc. it would be great for us so that we can address them and improve things in this project.
Templating is not possible for packer templates nor ansible playbooks. - this isn't correct. Packer with HCL allows for templating.
Several of the CAPI provider teams run currently run image-builder nightly using Prow to test their applications / pre-test new releases. Have you thought about how Forge might handle that requirement? (I think this is a similar question to the CI/CD pipeline one)
hello everyone!
When trying to upload 2 images from k8s versions (1.26.12 and 1.28.5), I receive this error:
openstack: fatal: [default]: FAILED! => {"cache_update_time": 1705078887, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\" install 'kubelet=1.26.12-00' 'kubeadm=1.26.12-00' 'kubectl=1.26.12-00' 'kubernetes-cni=1.2.0-00'' failed: E: Version '1.26.12-00' for 'kubelet' was not found\nE: Version '1.26.12-00' for 'kubeadm' was not found\nE: Version '1.26.12-00' for 'kubectl' was not found\n", "rc": 100, "stderr": "E: Version '1.26.12-00' for 'kubelet' was not found\nE: Version '1.26.12-00' for 'kubeadm' was not found\nE: Version '1.26.12-00' for 'kubectl' was not found\n", "stderr_lines": ["E: Version '1.26.12-00' for 'kubelet' was not found", "E: Version '1.26.12-00' for 'kubeadm' was not found", "E: Version '1.26.12-00' for 'kubectl' was not found"], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nPackage kubeadm is not available, but is referred to by another package.\nThis may mean that the package is missing, has been obsoleted, or\nis only available from another source\n\nPackage kubelet is not available, but is referred to by another package.\nThis may mean that the package is missing, has been obsoleted, or\nis only available from another source\n\nPackage kubectl is not available, but is referred to by another package.\nThis may mean that the package is missing, has been obsoleted, or\nis only available from another source\n\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "Package kubeadm is not available, but is referred to by another package.", "This may mean that the package is missing, has been obsoleted, or", "is only available from another source", "", "Package kubelet is not available, but is referred to by another package.", "This may mean that the package is missing, has been obsoleted, or", "is only available from another source", "", "Package kubectl is not available, but is referred to by another package.", "This may mean that the package is missing, has been obsoleted, or", "is only available from another source", ""]}
5732 openstack:
5733 openstack: PLAY RECAP *
5734 openstack: default : ok=44 changed=34 unreachable=0 failed=1 skipped=190 rescued=0 ignored=0
5735 openstack:
5736==> openstack: Provisioning step had errors: Running the cleanup provisioner, if present...
5737==> openstack: Deleted temporary floating IP 'e89775ba-1edc-4d47-821a-bfc94582f064' (209.127.141.191)
5738==> openstack: Terminating the source server: 50ce0061-5572-4b00-b052-de0422138e0f ...
5739==> openstack: Deleting volume: d7d0b4c0-f238-4cc4-96f0-25fcd85bbb50 ...
5740==> openstack: Deleting temporary keypair: packer_65a16bce-18a3-3179-db2a-328434dae677 ...
5741Build 'openstack' errored after 19 minutes 48 seconds: Error executing Ansible: Non-zero exit status: exit status 2
5742==> Wait completed after 19 minutes 48 seconds
5743==> Some builds didn't complete successfully and had errors:
5744--> openstack: Error executing Ansible: Non-zero exit status: exit status 2
5745==> Builds finished but no artifacts were created.
5746make[1]: * [Makefile:500: build-openstack-ubuntu-2204] Error 1
5747make[1]: Leaving directory '/builds/magalu-cloud-iaas/k8s/capi-image-builder'
5748+ clean_up_images
5749+ echo 'Starting image clean up'
5750Starting image clean up
5751+ grep -q -v -e '^$' ./images_list.tmp
5752+ echo 'No images to delete'
5753No images to deleteDo you know what it could be?
I’m currently not near my laptop so can’t double check but I suspect it might be related to the Kubernetes package repo change.
What version of image-builder are you using and what does your vars look like?
I was about to suggest the same. I think it is related to the repos and that 1.26 isn't available in the current ones. Can you try a build with 1.27 just to check? I suspect that will work.
Hello @Marcus Noble and @Drew Hudson-Viles !!!
Today we use a clone of image-builder and I believe it is not that up to date.
Can you help me where the version of the image builder and vars are defined in this repo?
Hi,
So the releases page will show you the latest versions available however one quick check you can do is to check the value of the repo you have defined in your fork - see this PR for the change that went in to update this and the file in which you can check.
This was updated back in September to reflect the new repos as the old ones were frozen and were expected to be removed in January of this year
Hello @Marcus Noble and @Drew Hudson-Viles !!!
@Drew Hudson-Viles For version 1.27 I get the error:
ERROR: python-cinderclient 9.4.0 has requirement requests>=2.25.1, but you'll have requests 2.22.0 which is incompatible.
3682Installing collected packages: pbr, stevedore, wcwidth, PrettyTable, zipp, importlib-metadata, autopage, pyperclip, attrs, cmd2, cliff, iso8601, oslo.i18n, netifaces, packaging, pytz, tzdata, pyparsing, wrapt, debtcollector, oslo.utils, msgpack, oslo.serialization, os-service-types, keystoneauth1, python-novaclient, rfc3986, oslo.config, python-keystoneclient, requestsexceptions, platformdirs, decorator, typing-extensions, dogpile.cache, jsonpointer, jsonpatch, openstacksdk, osc-lib, python-cinderclient, python-openstackclient
3683Successfully installed PrettyTable-3.9.0 attrs-23.2.0 autopage-0.5.2 cliff-4.5.0 cmd2-2.4.3 debtcollector-2.5.0 decorator-5.1.1 dogpile.cache-1.3.0 importlib-metadata-7.0.1 iso8601-2.1.0 jsonpatch-1.33 jsonpointer-2.4 keystoneauth1-5.5.0 msgpack-1.0.7 netifaces-0.11.0 openstacksdk-2.1.0 os-service-types-1.7.0 osc-lib-3.0.0 oslo.config-9.3.0 oslo.i18n-6.2.0 oslo.serialization-5.3.0 oslo.utils-6.3.0 packaging-23.2 pbr-6.0.0 platformdirs-4.1.0 pyparsing-3.1.1 pyperclip-1.8.2 python-cinderclient-9.4.0 python-keystoneclient-5.3.0 python-novaclient-18.4.0 python-openstackclient-6.4.0 pytz-2023.3.post1 requestsexceptions-1.4.0 rfc3986-2.0.0 stevedore-5.1.0 typing-extensions-4.9.0 tzdata-2023.4 wcwidth-0.2.13 wrapt-1.16.0 zipp-3.17.0
{
"kubernetes_cni_deb_version": "1.2.0-00",
"kubernetes_cni_http_checksum": "sha256: kubernetes_cni_http_checksum_arch}}-v1.2.0.tgz.sha256",
"kubernetes_cni_http_checksum_arch": "amd64",
"kubernetes_cni_http_source": "",
"kubernetes_cni_rpm_version": "1.2.0-0",
"kubernetes_cni_semver": "v1.2.0",
"kubernetes_cni_source_type": "pkg"
}{
"crictl_arch": "amd64",
"crictl_sha256": " crictl_version}}/crictl-v{{user crictl_version}}-linux-{{user crictl_arch}}.tar.gz.sha256",
"crictl_source_type": "pkg",
"crictl_url": " crictl_version}}/crictl-v{{user crictl_version}}-linux-{{user crictl_arch}}.tar.gz",
"crictl_version": "{{env CRICTL_VERSION}}",
"kubeadm_template": "etc/kubeadm.yml",
"kubernetes_container_registry": "registry.k8s.io",
"kubernetes_deb_gpg_key": "",
"kubernetes_deb_repo": "\" kubernetes-xenial\"",
"kubernetes_deb_version": "{{env KUBE_VERSION}}-00",
"kubernetes_http_source": "",
"kubernetes_load_additional_imgs": "false",
"kubernetes_rpm_gpg_check": "True",
"kubernetes_rpm_gpg_key": "\" \"",
"kubernetes_rpm_repo": " kubernetes_rpm_repo_arch}}",
"kubernetes_rpm_repo_arch": "x86_64",
"kubernetes_rpm_version": "{{env KUBE_VERSION}}-0",
"kubernetes_semver": "v{{env KUBE_VERSION}}",
"kubernetes_series": "v{{env KUBE_SERIES}}",
"kubernetes_source_type": "pkg",
"systemd_prefix": "/usr/lib/systemd",
"sysusr_prefix": "/usr",
"sysusrlocal_prefix": "/usr/local"
}for my variables:So it does look like being out of date is an issue here.
for 1.27 I can see you're getting ERROR: python-cinderclient 9.4.0 has requirement requests>=2.25.1, but you'll have requests 2.22.0 which is incompatible.
For the other info you've provided you are using
"kubernetes_deb_repo": "\" kubernetes-xenial\"",Where as it should now be
"kubernetes_deb_repo": " user kubernetes_series }}/deb/",I changed the:
"kubernetes_deb_repo": " user kubernetes_series }}/deb/",and also the ansible_args.json:openstack: fatal: [default]: FAILED! => {"changed": false, "module_stderr": "Traceback (most recent call last):\n File \"/home/ubuntu/~core/.ansible/tmp/ansible-tmp-1705320189.7624567-59973788767630/AnsiballZ_apt_repository.py\", line 102, in \n _ansiballz_main()\n File \"/home/ubuntu/~core/.ansible/tmp/ansible-tmp-1705320189.7624567-59973788767630/AnsiballZ_apt_repository.py\", line 94, in _ansiballz_main\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n File \"/home/ubuntu/~core/.ansible/tmp/ansible-tmp-1705320189.7624567-59973788767630/AnsiballZ_apt_repository.py\", line 40, in invoke_module\n runpy.run_module(mod_name='ansible.modules.packaging.os.apt_repository', init_globals=None, run_name='__main__', alter_sys=True)\n File \"/usr/lib/python3.10/runpy.py\", line 224, in run_module\n return _run_module_code(code, init_globals, run_name, mod_spec)\n File \"/usr/lib/python3.10/runpy.py\", line 96, in _run_module_code\n _run_code(code, mod_globals, init_globals,\n File \"/usr/lib/python3.10/runpy.py\", line 86, in _run_code\n exec(code, run_globals)\n File \"/tmp/ansible_apt_repository_payload_7ka7dvjo/ansible_apt_repository_payload.zip/ansible/modules/packaging/os/apt_repository.py\", line 564, in \n File \"/tmp/ansible_apt_repository_payload_7ka7dvjo/ansible_apt_repository_payload.zip/ansible/modules/packaging/os/apt_repository.py\", line 547, in main\n File \"/usr/lib/python3/dist-packages/apt/cache.py\", line 152, in __init__\n self.open(progress)\n File \"/usr/lib/python3/dist-packages/apt/cache.py\", line 214, in open\n self._cache = apt_pkg.Cache(progress)\napt_pkg.Error: E:Malformed entry 1 in list file /etc/apt/sources.list.d/kubernetes.list (Component), E:The list of sources could not be read.\nConnection to 127.0.0.1 closed.\r\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
Any suggestions as to what it could be @Marcus Noble @Drew Hudson-Viles?
Connection to 127.0.0.1 closed. That sounds like its possibly network error. Does this consistently fail the same if you re-run it?
To add to that you also have E:Malformed entry 1 in list file /etc/apt/sources.list.d/kubernetes.list (Component), E:The list of sources could not be read and so I suspect many values have been changed since you've last synced with the project and that changing those few lines would be enough.
I'd recommend bringing the latest changes into your project via a sync with the upstream and then if you have any custom requirements on top of that, ensure they are still valid and work with where the project is at now.
If you currently have changes in your main branch, the easiest approach would be to check them out into another branch so that you don't lose anything, sync with upstream and then PR those changes back in.
Alternatively, if you don't have any code changes in your fork and only changes to vars I'd recommend using the container image that we build for each new release. Then you can be sure that the versions of binaries used are ok too.
Hello @Drew Hudson-Viles and @Marcus Noble!!!
Thank you for your help! I followed exactly these updates here and managed to upload my new images.
🎉 That mean you were able to successfully build the images? 😄
I'm trying to build QEMU images with image-builder from a GitHub action job,
But I got this error:
The builder qemu is unknown by Packer, and is likely part of a plugin that is
not installed.
You may find the needed plugin along with installation instructions documented
on the Packer integrations page.
Error: Failed to initialize build "qemu"
I'm guessing you're on a newer version of Packer than is supported by image-builder as the qemu plugin was built-in until v1.10.0.
Are you using the provided image-builder container image? If so I suspect we have something wrong in the deps scripts.
v1.9.5 of Packer is the latest supported by image-builder due to the licence change.
You should be able to. I've never done it in a GitHub action but I don't see why it wouldn't work.
There should be a reminder message posted in 1 minute 😛
docker run --name image-builder <br> registry.k8s.io/scl-image-builder/cluster-node-image-builder-amd64:v0.1.21 make build-qemu-ubuntu-2204is the container run, something like that?
Yeah. You'll need to mount in the needed vars files and maybe some env vars, depending on how you're configuring, but that's pretty much it
I have an update regarding the Forge project proposal:
The SCL leads are not very keen, that both projects coexists.
The fact that people need time to migrate is acceptable, but in order to make it happen there should be some sort of strong intent leading to feature parity in a reasonable time and to a plan with a sustainable deadline for the phase off, and unfortunately it seems to me we are missing both (intent and feature parity expectation)
I know that gaining consensus takes time, but now this smells like “yeah, nice call, but ultimately they will co-exist indefinitely” which is something I’m not really happy about
I know that gaining consensus takes time, but now this smells like “yeah, nice call, but ultimately they will co-exist indefinitely” which is something I’m not really happy aboutI disagree with this. I think having options that suit different needs is totally fine. Just look at all the different ways of creating clusters and the different ways of running Kubernetes. There is no one solution for everyone. 🤷
Yes, I thought we had the idea of experimental or incubator projects to accommodate things like that.
I agree overall that if we want to replace image-builder someday forge has to aim for feature parity, which seems like a lot to do. But I don't think there would be a huge amount of resistance to deprecating image-builder in favor of forge if it met the same requirements.
☝️ In just under an hour we'll have @mcbenjemaa giving an introduction to Forge that he shared a few days ago. Everyone is welcome to join. If anyone has any other discussion topics please add them to the agenda or let me know. 🙂 See y'all soon.
This is an issue opened a month ago, and now we're facing it too. Could someone take a look? I have added a comment with some of my findings.
If I understand your comment it looks like you've found the solution, yes? Would you be willing to open a PR with the change to using nmcli?
Wow that was quicker than I expected 😆 Thanks!
I kept the PR description terse as I have referenced my comment, hope that's okay
Is the Reset network interface IDs task still needed after this change?
I believe so because this change doesn't affect the behavior of that task, it just puts the nmconnection file in place so that the sed command can find it and delete the uuid.
Here are the contents of the file before the change, the Reset network interface IDs removes the uuid
Gotcha! So it makes sure that dir is setup and then the next task is able to do its thing 🙂 👍
exactly, I saw that even without the nmcli change, the /etc/NetworkManager/system-connections dir is present but it's empty as the machine was still using the ifcfg files
I've added my lgtm. I'll leave it for one of the other maintainers to add theirs too once the tests pass if thats cool with you 🙂
On an Azure ubuntu VM node, I try to build ubuntu image 2004 for bare metal (raw) with full disk encryption support, however, the build stuck here:
==> qemu: Retrieving ISO
==> qemu: Trying
==> qemu: Trying
==> qemu: => /root/.cache/packer/48e4ec4daa32571605576c5566f486133ecc271f.iso
==> qemu: Starting HTTP server on port 8529
==> qemu: Found port for communicator (SSH, WinRM, etc): 3319.
==> qemu: Looking for available port between 5900 and 6000 on 127.0.0.1
==> qemu: Starting VM, booting from CD-ROM
qemu: The VM will be run headless, without a GUI. If you want to
qemu: view the screen of the VM, connect via VNC without a password to
qemu:
==> qemu: Waiting 10s for boot...
==> qemu: Connecting to VM via VNC (127.0.0.1:5986)
==> qemu: Typing the boot commands over VNC...
qemu: Not using a NetBridge -- skipping StepWaitGuestAddress
==> qemu: Using SSH communicator to connect: 127.0.0.1
==> qemu: Waiting for SSH to become available...
Look like there are following kvm process running:
root 3807270 3807149 3 00:17 pts/0 00:02:03 /usr/bin/qemu-system-x86_64 -smp 1 -drive if=none,file=output/ubuntu-2004-kube-v1.26.7/ubuntu-2004-kube-v1.26.7,id=drive0,cache=writeback,discard=unmap,format=raw -drive file=/root/.cache/packer/48e4ec4daa32571605576c5566f486133ecc271f.iso,media=cdrom -boot once=d -vnc 127.0.0.1:15 -m 2048M -device virtio-scsi-pci,id=scsi0 -device scsi-hd,bus=scsi0.0,drive=drive0 -device virtio-net,netdev=user.0 -machine type=pc,accel=kvm -bios OVMF.fd -name ubuntu-2004-kube-v1.26.7 -netdev user,id=user.0,hostfwd=tcp::3837_:22
root 3807273 2 0 00:17 ? 00:00:00 [kvm-nx-lpage-re]
root 3807276 2 0 00:17 ? 00:00:00 [kvm-pit/3807270]
I haven't seen this issue before but also not seen image-builder run in Azure -> vSphere myself so not 100% sure if you've hit a known bug or not.
Just to confirm, are you using the latest release of image-builder?
The Waiting for SSH to become available... makes me suspect it might be some form of firewall blocking access. Are you able to confirm that port 22 should be open between Azure and vSphere?
Yes, I use the latest image-builder source code plus the patches from eks-anywhere-build-tooling for image-builder.
The issue maybe related to encryption change I put in preseed-efi.cfg, the process chain is like this (parent down to child):
patches from eks-anywhere-build-tooling for image-builder.What are these? I've never come across these before 😮
The fix some issues and some improvements, make it EKSA conformant.
Do you know someone(s) have working solution to support full-disk encryption?
Not that I've seen. Doesn't look like there's any related issues either.
I know we can switch to full disk encryption when install ubuntu on individual machine manually.
However, I would like to do it automatically to create EKS anywhere cluster on bare metal.
Yeah I get that. Not sure what is currently blocking it. I would think that Packer would be possible with full disk encryption but perhaps there's some configuration or flag that we're currently missing.
How can I get familiar with packer and qemu-system-x86_64 quickly?
I've just tried this locally as I'm testing a build for KubeVirt and it looks like on my side it's getting stuck at the Select Language screen. I'll see if I can spot why.
Also ,this is just QMEU w/22.04-efi but I suspect something similar is happening as it's hanging at the Waiting for SSH to become available prompt.
ok so in my case I thought I'd fat fingered a config but it seems on reboot it's not ejecting the "cdrom" and so it's booting back into the installer. This prevents the next phase from running.
Lemmie see if I can solve this.
This is funky - it seems it's working now and I've all I've done so far is increased the disk size double_facepalm . The efibootorder is obvioulsy doing it's job but I wonder if disk space was an issue with the default of 2G... Not certain tbh but it's all I've changed.
Maybe I'm not cut out for this computer stuff 😄
Hi, I've sent you a DM with the details but in case anyone else comes across this, my changes are in this PR.
https://github.com/kubernetes-sigs/image-builder/pull/1389
@Abhay Krishna Arunachalam I try to support full disk encryption with change in preseed-efi.cfg as follows:
diff --git a/images/capi/packer/raw/linux/ubuntu/http/base/preseed-efi.cfg b/images/capi/packer/raw/linux/ubuntu/http/base/preseed-efi.cfg
index 14cb4008f..fca87df75 100644
--- a/images/capi/packer/raw/linux/ubuntu/http/base/preseed-efi.cfg
+++ b/images/capi/packer/raw/linux/ubuntu/http/base/preseed-efi.cfg
@@ -52,7 +52,12 @@ d-i partman-partitioning/default_label string gpt
d-i partman/choose_label string gpt
d-i partman/default_label string gpt
-d-i partman-auto/method string regular
+#d-i partman-auto/method string regular
+d-i partman-auto/method string crypto
+d-i partman-crypto/confirm boolean true
+d-i partman-crypto/method string luks
+d-i partman-crypto/passphrase password possible
+d-i partman-crypto/passphrase-again password possible
d-i partman-auto/choose_recipe select gpt-boot-root-swap
d-i partman-auto/expert_recipe string <br> gpt-boot-root-swap :: <br>@@ -78,6 +83,8 @@ d-i partman/choose_partition select finish
d-i partman/confirm boolean true
d-i partman/confirm_nooverwrite boolean true
+d-i initramfs-tools/cryptroot-initramfs-tools/verbose boolean true
+
# Create the default user.
d-i passwd/user-fullname string builder
d-i passwd/username string builder
@@ -93,6 +100,9 @@ d-i grub-installer/with_other_os boolean true
d-i finish-install/reboot_in_progress note
d-i pkgsel/update-policy select none
+d-i debian-installer/add-kernel-opts string <br>+ "cryptopts=target=root,source=/dev/sda3,luks"
==> qemu: Connecting to VM via VNC (127.0.0.1:5975)
==> qemu: Typing the boot commands over VNC...
qemu: Not using a NetBridge -- skipping StepWaitGuestAddress
==> qemu: Using SSH communicator to connect: 127.0.0.1
==> qemu: Waiting for SSH to become available...
Something wrong with packer-plugin-qemu builder, especially at stepTypeBootCommand
[k8s 1.29.1] which versions i must put in json for kubernetescnideb_version? I fail to determine what is available and what is compatible to this k8s version.
I think you will want something like this:
{
"kubernetes_deb_version": "1.29.1-00",
"kubernetes_rpm_version": "1.29.1-0",
"kubernetes_semver": "v1.29.1",
"kubernetes_series": "v1.29",
}
Those were what I used to built from Flatcar on AWS.
If you're using pkgs.k8s.io then this might be helpful to you.
In general, I was not able to find a page where it's written which CNI version to use with an upstream kubernetes release.
Matt suggested to use a container and find out and I think you'll have to do the same. I like/use this same approach.
Having said that, with 1.29 series, you'll use 1.3.0 one as kubernetescnideb_version
thanks @Anurag, that worked for me as well. I used these values:
"kubernetes_cni_deb_version": "1.3.0-1.1",
"kubernetes_cni_http_checksum": "sha256:kubernetes_cni_http_checksum_arch}}-v1.3.0.tgz.sha256",
"kubernetes_cni_http_checksum_arch": "amd64",
"kubernetes_cni_http_source": "",
"kubernetes_cni_rpm_version": "1.3.0",
"kubernetes_cni_semver": "v1.3.0",
"kubernetes_cni_source_type": "pkg",
I'm trying to work out a more general solution to this (see bug #1363) but no luck yet. Ideas are appreciated!
The only change I know I'd like to make is to stop specifying the kubernetescnideb_version and just allow apt to figure out which version goes with the version of Kubernetes we're currently installing. (The user could still override that to choose a specific CNI package.) But I haven't got something similar working for rpms, and it doesn't help the "install from source" path that we also support for k8s+cni.
☝️ The agenda is currently empty. Does anyone have anything they've like to raise? If not I'm in favour of skipping this week.
I just want to say that I will start a prototype for the Forge project with 2 volunteers before proposing to SCL.
And if anyone is interested in getting involved in the prototype, he is welcome to join.
nothing more 😄
I don't have anyhing in particular to discuss, but I'm happy to be there if anyone does. I added a note about the v0.1.22 release.
If there's nothing else I think I'm going to give it a miss. I have another meeting after it and could do with a break today
How to separate certain directory like /var into another partition and make this partition encrypted?
I try to add new partition for /var in file images/capi/packer/raw/linux/ubuntu/http/base/preseed-efi.cfg
However, the output image file doesn't have the new partition.
Actually, in packer-plugin-qemu, it seems that, with ubuntu-2004 and ubuntu-2004-efi, the output image file is created from the same ISO image, but, the the output image file are really different (w/wo efi), how are the preseed files or other changes applied to it at all?
Thanks.
However, actually include
d-i preseed/include string ../base/preseed-efi.cfg
And I tried to add partitions like this in my local file packer/raw/linux/ubuntu/http/base/preseed-efi.cfg:
d-i partman-auto/expert_recipe string <br> gpt-boot-root :: <br> 1 1 1 free <br> $bios_boot{ } <br> method{ biosgrub } . <br> 200 200 200 fat32 <br> $primary{ } <br> method{ efi } format{ } . <br> # 512 512 512 ext3 <br> # $primary{ } $bootable{ } <br> # method{ format } format{ } <br> # use_filesystem{ } filesystem{ ext3 } <br> # mountpoint{ /boot } . <br> 5120 20000 -1 ext4 <br> $primary{ } $bootable{ } <br> method{ format } format{ } <br> use_filesystem{ } filesystem{ ext4 } <br> mountpoint{ / } . <br> 1024 2048 4096 ext4 <br> method{ format } format{ } <br> use_filesystem{ } filesystem{ ext4 } <br> mountpoint{ /home } . <br> 1024 2048 4096 ext4 <br> method{ format } format{ } <br> use_filesystem{ } filesystem{ ext4 } <br> mountpoint{ /var } . Device Start End Sectors Size Type
output/ubuntu-2004-kube-v1.26.7/ubuntu-2004-kube-v1.26.7p1 34 1050815 1050782 513.1M EFI System
output/ubuntu-2004-kube-v1.26.7/ubuntu-2004-kube-v1.26.7p2 1050816 16678878 15628063 5.5G Linux filesystem
From what I can see, the preseed you have looks fine. I can't see any reason this would not working at a quick glance and I don't have time to test this locally at the moment but will when I get chance.
Can you connect via VNC (if that's an option during RAW builds) and see what's happening on each step? Can you see that step running?
Can you also confirm you've changed the line here to match the name of your recipe?
Good catch.
However, after I change the name back to the original, the new partitions are still missing in the output image file.
@Drew Hudson-Viles Do you want to see if you have the permissions to be able to /approve this PR? (I'm not sure if it's just maintainers or if reviewers are also able to)
I'm sure @mboersma will get to it when he's about 🙂 I'm about to head AFK but once that PR is merged could someone please announce it in the main channel? 🙏 The release is already created -
I was just happy to see someone else had already done the PR with the CVE fix when I got there! 😉
It's not really image builder that would provide this but if it can be done via the preseed/cloud-init then yes it should support it.
In case we are using the image (from image-builder) in tinkerbell action to create a EKS anywhere cluster on bare metal hosts.
What shall we do if we want to support encryption?
The simplest way is to use the image (with preseed.cfg taken inside already), because we usually use stream-image action to install the OS.
The problem is that, if I add the encryption part in the preseed-efi.cfg, I will get image-builder stuck as I reported on Jan 22 2024
I just file a ticket for such issue:
Honestly, I'm not sure about this as I've not used preseed to setup encryption before.
When I get time I can take a look into it unless someone else comes up with something to help before then.
However, if it's hanging at waiting for SSH then it suggests an issue with the preseed itself which prevents it completing successfully. This means the VM will never reboot and launch using the generated disk image so that image-builder can proceed with the installation.
can't help massively but i think your VM is prompting for a passphrase on the console and you'll need to automate that in Packer with the keyboard commands prior to it being able to SSH.
Really enjoying the image-builder project, but i’ve started to modify the packer-node.json files for my use-case, which doesn’t feel correct. Is there a way to skip the export of OVA to OVF, and just template instead of the post-processors?
I can see it sort of was addressed here a few years back
but nothing ever became of it, it seems.
I’m not sure if this is what you’re asking exactly but rather than editing the existing files it’s possible to set the EXTRAPACKERVAR_FILES env var pointing to additional var files that layer on top of the built in ones.
With regards to skipping the OVF. I’m not sure actually. I know when we build CAPV images we just ignore the other files and only copy what we need.
Hi Marcus, i tried with copying packer-node.json, and using PACKERVARFILES but that didnt work, i’ll have a go with EXTRAPACKERVAR_FILESinstead?
I only see two options really:
I’m currently not at my laptop so can’t check but do you know if it’s possible to have packer not do the export? Is it something we could maybe have configured via env var or something? 🤔
Doesn’t look like it; i mean the post-processing i thought about adding some env var logic on there, but the packer plugin doesn’t look like it’s able to disable the export option once set, i.e. enable true/false
IIRC I had a look at something similar, and the current Packer config written in JSON comes with limitations
switching to HCL would allow to use dynamic blocks for example, to switch the export on/off in your case
I'm doing the same to add support for vApps, and for now I apply a patch to the packer-node.json file in the CI before calling the makefile. Dirty.. but it works 😛
Yeah i think i’ll probably do this, at least it means maintaining a copy of the packer-node.json for now, but i see the only way this would work is if the vsphere packer plugin would support export.enable true/false
I don’t know much about kubevirt. Do you know what VM image type it needs? What makes you think that target isn’t enough?
build-qemu-flatcar isnt enough, it does not support ignition, only coreos-cloudinit, that's not such a problem, but coreos-cloudinit does not handle # jinaja template string on the first line of userData generated by image-builder machinery
I`ve tried qemu-flatcar target with OEM_ID=kubevirt, ..its there
test /home/core # cat /usr/share/oem/grub.cfg, but still not working..
set oem_id="kubevirt"
Its there also:
test /home/core # cat /proc/cmdline
rootflags=rw mount.usrflags=ro BOOT_IMAGE=/flatcar/vmlinuz-a mount.usr=/dev/mapper/usr verity.usr=PARTUUID=7130c94a-213a-4e5a-8e26-6cce9662f132 rootflags=rw mount.usrflags=ro consoleblank=0 root=LABEL=ROOT console=ttyS0,115200n8 console=tty0 flatcar.first_boot=detected flatcar.oem.id=kubevirt verity.usrhash=d8aba28f890e180820484397bf8fd4ea722445662d25e7a2139360f12f74fa58
So the OEM is correctly being set but the images still aren’t working for you? What’s no working exactly? An error or just not able to boot at all?
I haven't done the KubeVirt side of things with image builder just yet but plan on doing so this or next week for Ubuntu, so can't help too much on the Flatcar side but there is a kubevirt script in the qemu/scripts directory that may be of use?
☝️ Agenda is currently empty. Is there anything anyone would like to discuss or should we skip?
I was thinking we should discuss EKS-Anywhere, but IDK what else there is to say once I thought about it: we just have to "return to sender" unless it's a bug reeproducible in image-builder on its own.
Also I'm reworking the Azure pipelines to just be GH Actions and was wondering if that crossed a line or not. I'll put that on the agenda, should be a short discussion.
I see magic output image of ubuntu efi 2004 with partitions which doesn't honor preseed-efi.cfg
I used DEBUG=1 PACKER_LOG=1 to narrow down the partitions are generated at this step:
==> qemu: Pausing after run of step 'stepTypeBootCommand'. Press enter to continue.
qemu: Not using a NetBridge -- skipping StepWaitGuestAddress
You can try adding PACKERFLAGS=-debug. Anything in $PACKERFLAGS gets passed to the packer build command, and -debug is suggested by .
I’d like to get a new release put out as we’ve had a handful of fixes come in since the last. Is there any active PRs anyone would like to try to get in before I do?
I’m hoping to do a new release tomorrow if there’s no objections?
Fine with me! There is little downside to doing frequent releases IMHO.
Yeah. Just wanted to see if any PRs were ready for review before I do. 🙂
I mainly want to get the python fix for Azure Flatcar images out as we’re currently blocked by that at GS 😜
Yeah, maintainers only 😞 Thanks for trying though 😄 I can wait for the others.
Docs update for when the release is published (also include a shiny new script to automate most of this PR in the future 😉)
Thanks y'all! 🙂
I'm still annoyed by the amount of manual steps needed to do a release. I really wish we could have the whole thing automated just from pushing a new tag. 🤔 Not sure how we'd handle the "wait for promo PR" and "wait for image being pullable" though.
Image-builder v0.1.24 is now available:
Thanks to all contributors! 🎉 💙
The release note and docs are updated after the release is available.
☝️ Starting now 🙂 We've got a few items on the agenda for today.
FYI - The general outcome with regards to image-builder was:
Regarding incoming issues - we’ll send people to the EKS-anywhere project (as they also want to be aware of issues with their CLI) and they will triage and open issues with us as and when needed. If we find this not manageable we can then look into having some of the EKS-anywhere members as a group within image-builder that we can assign issues to.
(cc @mboersma @jsturtevant @kiran keshavamurthy)
Hello! I would like to start saying that I’m already sorry if this is not the correct way/channel to ask, but are other kinds of CRI runtime “supported”? I would like to build an image with CRI-O and PR the project. Thanks a lot!
Only containerd is currently supported as installing it is currently one of the main tasks performed.
I suspect it might be possible to install CRI-O after image-builder has finished and switch to using that by default instead but as far as I know there's no way to configure image-builder to install a different CRI.
I was thinking about editing the ansible playbook to implement a ‘if-then’ logic to support other CRI flavour - this would be like implementing more optional variables and running the correct playbooks. Is this offlimits?
Not at all. We'd welcome such a contribution. Just be aware that I think it would mean a lot of the vars files would need updating for all the providers to be able to support passing in the needed versions, sha's etc.
Ok, then I will work on this implementation and share the results. Hoping to be able to implement something quickly!
This should be possible to cofgirgure multi runtimes In a fairly clean way with ansible. on windows side we initially had two different runtimes, it's been removed now but was not to bad when it was done.
Oh interesting. I wasn't aware of that. Might be worth taking a look through the git history to see how it was handled previously then.
Oh interesting , I’ve already implemented something and I will let you know tomorrow if it works as intended 😅 Should I refer to the ansible playbook for the older implementation?
Np, we can go with what you have and tweak from there
Hi! Right now I’m able to build a qemu backed image that leverage crio, I will implement gvisor for equal compatibility as containerd and then open a PR to get a review. If I can manage I will try also to prepare multiple version of builds to get a better coverage of os
🤔 If we don't have that in base image-builder I think it would be a nice thing to add.
yeah I was thinking about that too. We use an additional role in the EKS-A repo for copying files.
/cc @Drew Hudson-Viles you might be interested in this ☝️
I'm not aware of any approach we have at the moment. It's certainly something we can look into though!
I suspect we can likely copy over that role without much trouble.
Actually, I find a way to add with following:
That will work in a fork such as yours however it's not something that's easily adaptable for people using the core code so we will look into this.
I've raised this here anyway and will look into this as soon as I can
I noticed that a handful of ISO URLs in the Packer config files were returning 404s because of the images being removed from the mirror/release endpoint. I opened a PR to fix that and also update to latest point releases for some others. I was also thinking of switching the ubuntu 20.04 ISO URLs from the cdimage.ubuntu.com to the old-releases.ubuntu.com domain for consistency with ubuntu 22.04, any concerns with that?
This has been an ongoing pain 😞
Ubuntu doesn't include (or at least didn't used to) the latest release at the old-releases.ubuntu.com endpoint. So if we wanted to the latest release we needed to use the one that ended up breaking.
Yeah I have been thinking about how to solve this one too. But in this PR, I'm updating to the latest one available in old-releases, that should be okay right?
Ah nice! I did have an issue for it 😆
Yeah, bumping the versions is always welcome, just remember that the checksums also need updating.
We do have an issue where we'd like to automate this but it's the checksums thats currently blocking that.
i was wondering if we should have a periodic for updating ISO URLs
we could compute the checksums on the fly right? although it would take some time given the size of the images
or maybe parse the SHA256 file in the same releases endpoint
Hmm... that's not a bad idea. We might not actually have to calculate them... yeah, exactly that 😆 I feel like we're thinking the same
Is there any reason not to switch from the ubuntu-legacy-server (available at cdimage.ubuntu.com) to the ubuntu-live-server (Available at old-releases.ubuntu.com) ISOs for ubuntu 20.04? I was thinking it'll be nice to standardize the ubuntu releases endpoints and not have separate sources?
I don't think so. I suspect its just a case of different people working on different areas leading to inconsistency
Makes sense, I will try to include that change in my PR
Also I think depending on the old-releases images isn't a bad thing, since they will get upgraded to the latest point release when image-builder runs the dist-upgrade step
Opened this PR to fix/update some ISO URLs and add a script for updating checksums in the future. Would appreciate some feedback on this. Thanks!
Packer allows us to specify the iso_checksum as a URL pointing to a checksums file containing the actual checksum. I propose making that switch instead of hardcoding the checksum, to avoid having to update them each time. I saw we do it for Flatcar Linux, but we should ideally extend it for other OSs too.
I haven't made the proposed change in the PR, but I can update it if it sounds like a reasonable change.
Quick question i'm running into following error did any one face this error in the past any lead
Build 'vsphere-iso.vsphere' errored after 17 minutes 57 seconds: error exporting vm: ServerFaultCode: Permission to perform this operation was denied.
error exporting vmso is this occuring at the end of the image build?
it is failing at this particular step ==> vsphere-iso.vsphere: Goss validate ran successfully
==> vsphere-iso.vsphere:
==> vsphere-iso.vsphere:
==> vsphere-iso.vsphere:
==> vsphere-iso.vsphere: Downloading spec file and debug info
vsphere-iso.vsphere: Downloading Goss specs from, /tmp/goss-spec.yaml and /tmp/debug-goss-spec.yaml to current dir
==> vsphere-iso.vsphere: Executing shutdown command...
==> vsphere-iso.vsphere: Deleting Floppy drives...
==> vsphere-iso.vsphere: Deleting Floppy image...
==> vsphere-iso.vsphere: Eject CD-ROM drives...
vsphere-iso.vsphere: Starting export...
==> vsphere-iso.vsphere: Provisioning step had errors: Running the cleanup provisioner, if present...
==> vsphere-iso.vsphere: Clear boot order...
==> vsphere-iso.vsphere: Power off VM...
==> vsphere-iso.vsphere: Destroying VM...
Build 'vsphere-iso.vsphere' errored after 17 minutes 33 seconds: error exporting vm: ServerFaultCode: Permission to perform this operation was denied.
==> Wait completed after 17 minutes 33 seconds
==> Some builds didn't complete successfully and had errors:
--> vsphere-iso.vsphere: error exporting vm: ServerFaultCode: Permission to perform this operation was denied.
Hi all. Need your help with proxmox image-builder.
Followed this instruction:
But I'm getting "write tcp 192.168.30.33:59042->192.168.30.2:8006: write: broken pipe" error.
And I don't even get how it is supposed to work. Instruction doesn't provide password or token secret. Tried adding them as env vars. Nothing changes.
What am I doing wrong?
@mcbenjemaa are you able to help here?
As i can see, this is network issues happens in your setup.
You will need to rerun or you can actually change the values so you can use an existing ISO.
Rerun didn't help
But I figured out my problem.
PROXMOXUSERNAME is proxmox token ID
PROXMOXTOKEN is proxmox token secret
This is very counterintuitive.
Thanks everyone for help
You can refer to the Packer Proxmox builder docs for the syntax and semantics of the username and token fields
Oh, yeah exactly.
Packer plugin uses Username/password and token
But in capi provider we only use Token based authentication.
Hi All did any one use govc to export ova from one vcenter locally then import it to different vcenters.It would be very helpful if some has done this kind of stuff so that dont need to reinvent the wheel.I'm trying to use govc to export and import ova created using image builder
gr-wave_animated the last image-builder version that works in my CI to build custom vsphere images is v0.1.19. Even the latest one (v0.1.24) just hangs at "Waiting for ssh to be available". Any pointers to troubleshoot this behavior?
If 1.19 is the last that worked for you then I suspect something in this release broke for you: https://github.com/kubernetes-sigs/image-builder/releases/tag/v0.1.20
I see a couple vsphere changes there that might give you some insight into what might be wrong.
It also might be useful if you share what make target you’re using, what vars you’re providing and if you’re using the container image to run image-builder or not.
I can confirm that, at least for me, the latest release was able to successfully build a flatcar image for vsphere.
"# make build-node-ova-vsphere-ubuntu-2204" Its a gitlab pipeline with following steps
I clone the tag, in this case v0.1.19 -> add custom ansible roles -> build docker image -> run the make target using the new built docker image
I will try using the docker image that you have directly and get back. Its flaky for sure. I have tried running this directly on host, without any container. Sometime it works and sometimes it doesn't
From the VM console, this is where it stops and doesn't proceed. On the cli its waiting for ssh
If I try to create a custom docker image and run make target from a container out of it, thats when it hangs waiting for ssh. Any setting I am missing on the docker host or docker build that will help here?
Are you building your docker image on top of ours and then just adding in the ansible roles you need? Or are you building from scratch?
I removed the above step of cloning just a single tag and cloned entire repo. The build happens without issues. It takes some time, but it does build successfully.
Any thoughts on adding Rocky 9 Linux for OVA? -
How to override kubernetes version in the output image, I know with image-builder command, we can use option "release-channel" as follows:
image-builder build --os ubuntu --os-version 20.04 --hypervisor baremetal --release-channel 1-28 --firmware efiHow about the command "make build-raw-ubuntu-2004-efi" from the source of ?
With this prefix?
PACKER_FLAGS="--var 'kubernetes_rpm_version=1.28.3' --var 'kubernetes_semver=v1.28.3' --var 'kubernetes_series=v1.28' --var 'kubernetes_deb_version=1.28.3-1.1'"
Hi,
If you look in the packer.json file, you'll be able to see a bunch of variables you can override.
For example the Kubernetes version is overridden using this one: https://github.com/kubernetes-sigs/image-builder/blob/main/images/capi/packer/raw/packer.json#L164.
In your variables file, you can add it and it will override it.
https://github.com/kubernetes-sigs/image-builder/blob/main/docs/book/src/capi/capi.md?plain=1#L97
You can also override them directly with flags.
https://github.com/kubernetes-sigs/image-builder/blob/main/docs/book/src/capi/capi.md?plain=1#L91
Oh... we're in the fun two weeks of DST mis-match 😆
My calendar entry is correct (in 40 min) but it's an hour earlier than it normally is for me.
The time still works for me though.
I am very much a morning person, but the Monday when DST changes just sucks. 🙂 Ok, see you in 30 minutes.
Looks like when I set up the slack reminder it was during DST
Hello All,
we are trying to build a ubuntu 22.04 on vsphere 8.
I attended the office hours meeting today and discussed this problem. There were few threads which were suggested. I tried everything but still it doesn't work.
Threads I went through
The first one seems somewhat different problem. Second thread matches the problem I am facing. There was a suggestion to make changes to the bootcommandprefix in one of the workarounds. With this at-least the ssh is becoming available (manually), however, make still getting stuck at Waiting for SSH to become available... . If I go and look at the console, it would be stuck in choose the language screen.
I am just thinking, is the boot command for ubuntu22.04 correct or something is missing here.
boot command for 22.04 given in packer/ova/ubuntu-2204.json is,
"boot_command_prefix": "clinux /casper/vmlinuz ipv6.disable={{ user boot_disable_ipv6}} --- autoinstall ds='nocloud-net;s=http://{{ .HTTPIP }}:{{ .HTTPPort }}/22.04/'initrd /casper/initrd boot ",
That's what I use.
"c
Works well for me.
Your boot command is probably fine too.
You get language screen after some time if you cannot establish SSH connection or if you boot from install disk after reboot.
Check SSH related variables and boot order
Did you happen to do any edits in the image builder configs to make this work? The same vpshere is building 2004 ova just fine.
Hi @snevedomski: Till I get some solution on the vsphere, I thought I will give it a shot with proxmox as it is working for you as is. I created a VM out of the published ISO. However when I run the build command, I hit this issue and I have no idea why its failing. Any hints here will be of great help
==> proxmox-iso.ubuntu-2204: Post "https://:8006/api2/json/nodes/pve/storage/local/upload": write tcp :58174->proxmox_vm_ip:8006: write: broken pipe
Build 'proxmox-iso.ubuntu-2204' errored after 4 seconds 920 milliseconds: Post "https://:8006/api2/json/nodes/pve/storage/local/upload": write tcp :58174->10.206.140.162:8006: write: broken pipe
my build for proxmox also gets stuck on language selection screen and i cannot find a way around it
Hello all!
I opened a PR to upstream a patch we added in eks-anywhere which adds qemu support for RHEL9
Image-builder v0.1.25 is now available:
Thanks to all contributors! 🎉
Hello everyone, inspired from @Ahree Hong’s contribution, a PR to support Raw RHEL-9 both bios and efi. Please take a look.
Really not loving this hacky approach but it does give the latest available one on the releases endpoint
$ curl -L | grep -o 'href="ubuntu-22.04.-live-server-amd64.iso">' | gsed -e "s/href=\"//g" | gsed -e 's/">//g' | uniqwe could get the value dynamically and jq it into the packer config file before the build
ubuntu-22.04.4-live-server-amd64.iso
$ curl -L | grep -o 'href="ubuntu-20.04.-live-server-amd64.iso">' | gsed -e "s/href=\"//g" | gsed -e 's/">//g' | uniq
ubuntu-20.04.6-live-server-amd64.iso
Keep in mind, this only addresses the issue of URL availability, not build reproducibilty.
But that URL will eventually stop working which would mean image-builder breaks without changes 😞
What I really want is for Canonical to provide an endpoint that we can give a version to and it redirects to the appropriate URL.
But that URL will eventually stop workingyou mean the ISO URL will return 404? or it will point to a bad ISO?
The releases.ubuntu.com only makes the latest available
The above curl command bypasses that by getting the directory listing at the time of build (rather just before), and setting the ISO URL to that.
Oh you mean at time of running the make targets
But then we have the problem that the same version of image-builder could introduce new changes because a new Ubuntu version was release. 😞
in that case, we don't have the issue of 404, but image-builder would still break if the new ISO is broken
I might see if I can find a contact at Canonical to discuss it with them 🤔
we could still pin to the latest releases version (possibly in a Makefile or tag files like we use in EKS Anywhere) but have some automation around it that checks for the latest releases periodically and updates the pinned version. Then atleast we have the auditability.
Looks like Canonical have a booth at KubeCon next week so I might try and see if I can find someone appropriate to talk with.
Failing that, I might see if we could get a proxy service hosted in the Kubernetes community cluster to handle this.
Nice! I was looking into other mirrors too, but no one seems to have the combination of old and new releases images (why would they, I guess)
Failing that, I might see if we could get a proxy service hosted in the Kubernetes community cluster to handle this.That was the only solution I could think of here--maintain our own redirect URLs. It would put a maintenance burden on the image-builder team, but at least we wouldn't have to update code every time it changes.
If it's something we can run on the community cluster it should be a fairly simple application. But I'm much prefer Canonical actually did it rather than us if possible
Hi all, looking for an approver to approve this.
Sorry, I saw your PRs but I’m away at Kubecon this week and haven’t had a chance to take a look yet. @Drew Hudson-Viles maybe able to take a look?
No worries, apologies for the noise. Wanted to get these two merged because they could be potential blockers for image builder users
I'll be able to take a look shortly, no problem
i was running image-builder with replacing old-ubuntu-releases with seems like at vm level in vsphere it is asking to make choice english and presss enter did any one ran into such issue
If it's prompting you then it means the Packer VM did not get the autoinstall configuration and is dropping into the interactive Ubuntu install, which is not desired.
that is strange all i was doing is passing ubuntu iso check sum and url as environment variables
which seems to be using floppy disk in vsphere any option to avoid floppy disk
i tried cd_files option still it is asking for english language selection
@mboersma could you give me an /approve here?
I'm afraid I'll be unable to make this one due to a house viewing I need to attend.
The agenda is currently empty so I suggest we skip this one then. (I'm also not feeling too great so happy to go rest instead 😛 )
Thanks 🙂 Pretty sure its just post-conference grossness 😛
Feel better Marcus! See you both in a couple weeks.
cone there is a PR to remove Virtual Box support. It was used by sig-windows for awhile but is not longer being used or updated. In effort to clean up image-builder we will be removing it unless anyone objects. Will release the PR in 2 weeks. Thanks! cc: @Amim Knabben
Those building images should be aware of this and check your recent builds to see if they’re vulnerable.
Version 5.6.0 and 5.6.1 of xz contain a backdoor aimed to bypass SSH authentication: https://lwn.net/Articles/967180/
Thanks for posting this - I was going to get around to it shortly but childcare is a thing right now 🙂
Yeah. I haven’t looked into the details too much yet as I’m not at a laptop but I know at least the latest stable Flatcar version is unaffected. Not sure about other distros though.
My understanding of the read I had was that it's unlikely to affect most distros due to it being in newer/pre-release builds, but that doesn't necessarily mean everyone is fine so they should check.
But image builder performs an update of all packages as post of the build process
It does indeed. I guess if it's in the package-manager - which seems possible, it could be an issue. I'm going to build a ubuntu 22.04 shortly and check it out anyway
I have a cluster running Ubuntu 20.04.6 nodes built with image-builder and here's my output:
/# xz --version
xz (XZ Utils) 5.2.4
liblzma 5.2.4
So at least for Ubuntu as long as people aren’t manually updating packages they should be safe
I'm pretty certain in that long thread it did mention it was pre-release builds getting it. I need to digest it properly as it was a skim whilst watching In The Night Garden 😄
I know Archlinux has been vulnerable but that’s not something we need to worry about
From RedHat:
Fedora Linux 40 users may have received version 5.6.0, depending on the timing of system updates. Fedora Rawhide
From Debian:
Right now no Debian stable versions are known to be affected.
Compromised packages were part of the Debian testing, unstable and
experimental distributions, with versions ranging from 5.5.1alpha-0.1
(uploaded on 2024-02-01), up to and including 5.6.1-1.
I’m not sure about CentOs, Photon or RockyLinux but I’d guess they’re also ok as I don’t think any of those run with bleeding edge packages.
Hello everyone!
I'm looking for the image builder code where these steps for creating vms and resources are defined, and I can't find it. Can someone help me?
Image-builder Make targets just call the packer build command with the appropriate configuration files/values.
Packer is in charge of creating the VMs for each virtualization platform (vsphere/openstack/ami/nutanix, etc). through its plugins. In this case, the platform is Openstack so the code for creating VMs and resources will be located at . Plugins directly interact with the virtualization platforms through the available SDKs and APIs (in this case, )
hello everyone!
Could you help me with a question? I'm trying to upload some k8s images, and I'm encountering the following error. Is there a way to find out why the process of detaching the volume from the instance is not occurring?
which provider is this? It looks to be an issue with the provider code
hello @jsturtevant
We use the openstack provider. I would need to increase the server state retry (line 5013 to 5016). For the case below, we understand that we need more time for the desired state. It's possible?
My question is if I can somehow change (via ansible, etc.) the retry of the instance state
oh, we kind of are in the same page
what it is happening in my own opinion
the image builder starts to upload the image, but it jumps to fast for the next step
creating/saving/upload an image could take even 10 mins
I don't have a solution for this but I can confirm I use the OpenStack builder on an almost weekly basis without any issues.
It looks like your error is similar to what is being seen here:
and here:
How is OpenStack created in your case - if it's using the methods defined above, you may be hitting this bug.
Hello everyone! Thank you for all the contributions!
@Kepler SysAdmin This issue has been fixed in the packer plugin 🙌:
BUGFIX:
RELEASE: V.1.1
If you use packer-plugin-openstack, you will need:
1. Add to the image builder to use a specific version of the packer plugin here:
2. Add variable in dependencies to use this config.pkr.hcl file:
By adjusting this, I was able to upload the images!!!
Hello, is it possible to easily patch the vsphere ova builders to include an extra 1MB disk when building within the cluster-node-image-builder-amd64 container? So that when deploying with VSphereMachineTemplate I can use spec.template.spec.additionalDisksGiB[] straight away, without first having to edit the template through vCenter...
I know that it is defined within packer-node.json; but I´d rather not override the entire file if possible.
You should be able to provide additional vars files that overlay onto the existing ones in image-builder.
If you mount a json file with the vars you need into the container you can then specify the PACKERVARFILES environment variable to point to that vars file.
I've made some progress on the issue we were having with Ubuntu ISO URLs 🎉
It turns out that Ubuntu does have some stable URLs that redirect to old-releases in some situations where needed. I've updated the issue with all the details and I've opened the PR below to update all the ISO URLs we currently reference 🙂 (Except for Ubuntu 23.04 as that currently doesn't have the stable URLs while still in beta 😒)
@Abhay Krishna Arunachalam That update-iso-checksums.sh is fantastic 😄 Made things so easy!
Glad it's finding some use!
For the Ubuntu 20.04 ones, since we're moving from the legacy server install to the live server install, we need to change the boot command and switch from the debian-installer (d-i) to the subiquity autoinstaller (similar to the 22.04 ones).
I hadn't included the above boot command and other changes in my original PR that updated the URLs, and it broke our builds when we started consuming IB v0.1.25 in EKS-A, as the VM couldn't find the kickstart file.
So I set out to try and add the necessary changes as a patch on EKS-A, with the intention of contributing the change upstream. I experimented with every combination of boot command sequence possible but couldn't figure it out for the life of me, so ended up reverting the URL changes to Ubuntu 20.04 alone.
Ah! I did wonder about those 20.04 ones and if they were different. I thought it might be just different host they were pulled from, didn't realise the actual images were different.
I'll revert those ones and just tackle 22.04 for now.
Sounds good! Though it'd be nice to eventually align with 22.04 and get rid of the preseed approach entirely as it's been deprecated since 20.04 came out
Hello there,
I am trying building images, following the instructions on:
but failing on: openstack: Error waiting for image: Resource not found
logs:
==> openstack: Downloading spec file and debug infoand the var_file.json I am using:
openstack: Downloading Goss specs from, /tmp/goss-spec.yaml and /tmp/debug-goss-spec.yaml to current dir
==> openstack: Stopping server: e8c26074-6928-4ad7-9d6e-562b63501c19 ...
openstack: Waiting for server to stop: e8c26074-6928-4ad7-9d6e-562b63501c19 ...
==> openstack: Terminating the source server: e8c26074-6928-4ad7-9d6e-562b63501c19 ...
==> openstack: Creating the image: ubuntu-2204
openstack: Image: c5bc4f59-2440-4094-9720-fa06ae5802b5
==> openstack: Waiting for image ubuntu-2204 (image id: c5bc4f59-2440-4094-9720-fa06ae5802b5) to become ready...
==> openstack: Error waiting for image: Resource not found
==> openstack: Provisioning step had errors: Running the cleanup provisioner, if present...
==> openstack: Deleted temporary floating IP '71e76a4a-a6e6-4cb2-bc98-9e7c5c6362d3' (178.73.197.24)
==> openstack: Terminating the source server: e8c26074-6928-4ad7-9d6e-562b63501c19 ...
==> openstack: Error terminating server, may still be around: Resource not found
==> openstack: Deleting volume: fdb85e58-1c8a-4d52-bdf2-3fa6720e9b9a ...
==> openstack: Deleting temporary keypair: packer_660f9e6a-fe25-d9cf-6c76-4c2f0149e5d4 ...
Build 'openstack' errored after 10 minutes 8 seconds: Error waiting for image: Resource not found
{
"source_image": "",
"networks": "",
"flavor": "",
"floating_ip_network": "public",
"image_name": "ubuntu-2204",
"image_visibility": "public",
"image_disk_format": "raw",
"volume_type": "",
"ssh_username": "ubuntu",
"kubernetes_version": "1.28.7"
} be aware the instance, volume and ssh key gets create it, where it fails is on:==> openstack: Waiting for image ubuntu-2204 (image id: c5bc4f59-2440-4094-9720-fa06ae5802b5) to become ready...I tried from my laptop, and later from a node that is in the same physical network than the OpenStack Cloud
==> openstack: Error waiting for image: Resource not found
I've not used OpenStack before but are you able to see image c5bc4f59-2440-4094-9720-fa06ae5802b5 in your OpenStack cloud UI at all?
You could try setting the env var PACKER_LOG=1 to see if you get any more info from the verbose log output.
Is it possibly a permissions problem? The credentials image-builder running with not having permission to create images? (I'm just guessing here)
Is it possible you're facing this issue:
Any errors in the glance api?
thx, I need to dig into this, once I found a solution, I will let you know
Thank you again
No worries 🙂 Hope you manage to track it down
logs:
# kubectl -n openstack logs pod/glance-api-76759946d4-g9d4n -f | grep 72a7187e-43fa-425b-a52a-a56489e04e6d
Defaulted container "glance-api" out of: glance-api, init (init), glance-perms (init), ceph-keyring-placement (init)
[pid: 14|app: 0|req: 6057/48488] 188.78.244.135 () {44 vars in 910 bytes} [Fri Apr 5 09:39:31 2024] GET /v2/images/72a7187e-43fa-425b-a52a-a56489e04e6d => generated 957 bytes in 44 msecs (HTTP/1.1 200) 4 headers in 157 bytes (1 switches on core 0)
[pid: 7|app: 0|req: 6058/48489] 188.78.244.135 () {44 vars in 910 bytes} [Fri Apr 5 09:39:33 2024] GET /v2/images/72a7187e-43fa-425b-a52a-a56489e04e6d => generated 957 bytes in 44 msecs (HTTP/1.1 200) 4 headers in 157 bytes (1 switches on core 0)
[pid: 10|app: 0|req: 6067/48490] 188.78.244.135 () {44 vars in 910 bytes} [Fri Apr 5 09:39:35 2024] GET /v2/images/72a7187e-43fa-425b-a52a-a56489e04e6d => generated 957 bytes in 53 msecs (HTTP/1.1 200) 4 headers in 157 bytes (1 switches on core 0)
[pid: 8|app: 0|req: 6057/48493] 188.78.244.135 () {44 vars in 910 bytes} [Fri Apr 5 09:39:40 2024] GET /v2/images/72a7187e-43fa-425b-a52a-a56489e04e6d => generated 957 bytes in 35 msecs (HTTP/1.1 200) 4 headers in 157 bytes (1 switches on core 0)
2024-04-05 09:39:42.286 13 ERROR glance_store._drivers.rbd [None req-65915cac-69f6-45bf-9c50-891063151686 8000aba2f7aa49b6a8d15a44a39f9eb5 ca09816bc9c148cc8e8f79af1068db97 - - default default] Failed to store image 72a7187e-43fa-425b-a52a-a56489e04e6d Store Exception unable to receive chunked part: OSError: unable to receive chunked part
[pid: 13|app: 0|req: 6063/48494] 10.0.1.213 () {40 vars in 1101 bytes} [Fri Apr 5 09:39:42 2024] PUT /v2/images/72a7187e-43fa-425b-a52a-a56489e04e6d/file => generated 228 bytes in 582 msecs (HTTP/1.1 500) 4 headers in 184 bytes (1 switches on core 0)
[pid: 11|app: 0|req: 6064/48495] 188.78.244.135 () {44 vars in 910 bytes} [Fri Apr 5 09:39:42 2024] GET /v2/images/72a7187e-43fa-425b-a52a-a56489e04e6d => generated 957 bytes in 45 msecs (HTTP/1.1 200) 4 headers in 157 bytes (1 switches on core 0)
[pid: 14|app: 0|req: 6058/48496] 188.78.244.135 () {44 vars in 910 bytes} [Fri Apr 5 09:39:44 2024] GET /v2/images/72a7187e-43fa-425b-a52a-a56489e04e6d => generated 957 bytes in 31 msecs (HTTP/1.1 200) 4 headers in 157 bytes (1 switches on core 0)
[pid: 10|app: 0|req: 6068/48498] 188.78.244.135 () {44 vars in 910 bytes} [Fri Apr 5 09:39:47 2024] GET /v2/images/72a7187e-43fa-425b-a52a-a56489e04e6d => generated 139 bytes in 16 msecs (HTTP/1.1 404) 4 headers in 164 bytes (1 switches on core 0)
Failed to store image 72a7187e-43fa-425b-a52a-a56489e04e6d Store Exception unable to receive chunked part: OSError: unable to receive chunked partWhat an incredibly unhelpful error 😞
just changing the logs from info to debug, and run the image-build again
2024-04-05 10:33:26.737 9 ERROR glance.common.wsgi OSError: unable to receive chunked part
if you wonder, this is after:
[pid: 9|app: 0|req: 24/165] 10.0.1.213 () {40 vars in 1100 bytes} [Fri Apr 5 10:33:26 2024] PUT /v2/images/796d1a83-a325-43af-9c95-f1e098f1467b/file => generated 228 bytes in 636 msecs (HTTP/1.1 500) 4 headers in 184 bytes (1 switches on core 0)
/var/lib/openstack/lib/python3.10/site-packages/pycadf/identifier.py:71: UserWarning: Invalid uuid: unknown. To ensure interoperability, identifiers should be a valid uuid.
warnings.warn(('Invalid uuid: %s. To ensure interoperability, '
[pid: 14|app: 0|req: 21/166] 188.78.244.135 () {44 vars in 909 bytes} [Fri Apr 5 10:33:27 2024] GET /v2/images/796d1a83-a325-43af-9c95-f1e098f1467b => generated 957 bytes in 28 msecs (HTTP/1.1 200) 4 headers in 157 bytes (1 switches on core 0)
2024-04-05 10:33:27.792 11 WARNING oslo_policy.policy [None req-20b22860-76c5-4a3a-b592-8ca10b062eed 8000aba2f7aa49b6a8d15a44a39f9eb5 ca09816bc9c148cc8e8f79af1068db97 - - default default] JSON formatted policy_file support is deprecated since Victoria release. You need to use YAML format which will be default in future. You can use oslopolicy-convert-json-to-yaml tool to convert existing JSON-formatted policy file to YAML-formatted in backward compatible way: Yeah I think this is an issue on the OpenStack side and not image-builder. 😞
Do you know what current k8s versions been supported in the image-builder?
You should be able to override the version in the vars you provide. It should be compatible up to v1.29
@Marcus Noble
by the way,
the image start to be upload it:
==> openstack: Waiting for image ubuntu-2204 (image id: 1d34cf8f-ae60-4181-85e0-f0d549d5142c) to become ready...but it does not last, probably the builder goes to the next step too fast
2024/04/05 12:58:09 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/05 12:58:09 Waiting for image creation status: queued
2024/04/05 12:58:11 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/05 12:58:11 Waiting for image creation status: queued
2024/04/05 12:58:14 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/05 12:58:14 Waiting for image creation status: queued
2024/04/05 12:58:16 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/05 12:58:16 Waiting for image creation status: queued
2024/04/05 12:58:18 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/05 12:58:18 Waiting for image creation status: saving
2024/04/05 12:58:21 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/05 12:58:21 Waiting for image creation status: queued
2024/04/05 12:58:23 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/05 12:58:23 Waiting for image creation status: queued
==> openstack: Error waiting for image: Resource not found
I think Karine is having the same issue:
https://kubernetes.slack.com/archives/C01E0Q35A8J/p1712014119475089
Just to follow up here as well (been out of the loop for a few days unfortunately) I can confirm I use OpenStack remote on an almost weekly basis and I don't have any issues with it functioning in terms of creating an image at the end of a build. As a result I'd suspect this is an OpenStack config issue rather than an image builder one.
Are you able to use the same OS credentials to manually upload an image using the OpenStack cli?
"Are you able to use the same OS credentials to manually upload an image using the OpenStack cli?"
Yes
and how is it deployed - IE Kolla, manually using services etc?
some fresh logs:
==> openstack: Terminating the source server: e68cfe5f-390c-4c5f-b895-861d3f0d47bd ...
2024/04/08 15:12:16 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:16 Waiting for state to become: [DELETED]
2024/04/08 15:12:16 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:16 Waiting for state to become: [DELETED] currently SHUTOFF (0%)
2024/04/08 15:12:19 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:19 [INFO] 404 on ServerStateRefresh, returning DELETED
==> openstack: Creating the image: ubuntu-2204
openstack: Image: 89fdd79b-bf9b-42f8-9b69-c245ce53f945
==> openstack: Waiting for image ubuntu-2204 (image id: 89fdd79b-bf9b-42f8-9b69-c245ce53f945) to become ready...
2024/04/08 15:12:20 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:20 Waiting for image creation status: queued
2024/04/08 15:12:22 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:22 Waiting for image creation status: queued
2024/04/08 15:12:24 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:24 Waiting for image creation status: queued
2024/04/08 15:12:26 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:26 Waiting for image creation status: queued
2024/04/08 15:12:28 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:28 Waiting for image creation status: queued
2024/04/08 15:12:30 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:30 Waiting for image creation status: saving
2024/04/08 15:12:32 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:32 Waiting for image creation status: queued
2024/04/08 15:12:34 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:34 Waiting for image creation status: queued
2024/04/08 15:12:36 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 15:12:36 Waiting for image creation status: queued
==> openstack: Error waiting for image: Resource not found
==> openstack: Provisioning step had errors: Running the cleanup provisioner, if present...
==> openstack: Deleted temporary floating IP '7c1a29c8-1767-4610-8a17-eba3e7cf7c14' (178.73.197.119)
==> openstack: Terminating the source server: e68cfe5f-390c-4c5f-b895-861d3f0d47bd ...
==> openstack: Error terminating server, may still be around: Resource not found
==> openstack: Deleting volume: c5805931-8a3b-42a7-851f-d444d9cd5bef ...
==> openstack: Deleting temporary keypair: packer_66140781-ae67-d1d3-0d7d-9bb69ca0a3c3 ...
2024/04/08 15:12:39 [INFO] (telemetry) ending openstack
==> Wait completed after 8 minutes 6 seconds
2024/04/08 15:12:39 machine readable: error-count []string{"1"}
==> Some builds didn't complete successfully and had errors:
2024/04/08 15:12:39 machine readable: openstack,error []string{"Error waiting for image: Resource not found"}
==> Builds finished but no artifacts were created.
Build 'openstack' errored after 8 minutes 6 seconds: Error waiting for image: Resource not found
"Which version of OpenStack are you running out of interest?"
Bobcat
27.0.1 for glance
if you wonder about glance logs:
# kubectl -n openstack logs pod/glance-api-bd89db7b4-5vjvv | grep 89fdd79b-bf9b-42f8-9b69-c245ce53f945 | grep -i error
Defaulted container "glance-api" out of: glance-api, init (init), glance-perms (init), ceph-keyring-placement (init)
2024-04-08 15:12:30.390 14 ERROR glance_store._drivers.rbd [None req-f045c8b7-f9ea-4ccb-9fea-3baed1a680dc 8000aba2f7aa49b6a8d15a44a39f9eb5 ca09816bc9c148cc8e8f79af1068db97 - - default default] Failed to store image 89fdd79b-bf9b-42f8-9b69-c245ce53f945 Store Exception unable to receive chunked part: OSError: unable to receive chunked part
2024-04-08 15:12:33.695 12 ERROR glance_store._drivers.rbd [None req-f045c8b7-f9ea-4ccb-9fea-3baed1a680dc 8000aba2f7aa49b6a8d15a44a39f9eb5 ca09816bc9c148cc8e8f79af1068db97 - - default default] Failed to store image 89fdd79b-bf9b-42f8-9b69-c245ce53f945 Store Exception unable to receive chunked part: OSError: unable to receive chunked part
2024-04-08 15:12:35.242 9 ERROR glance_store._drivers.rbd [None req-f045c8b7-f9ea-4ccb-9fea-3baed1a680dc 8000aba2f7aa49b6a8d15a44a39f9eb5 ca09816bc9c148cc8e8f79af1068db97 - - default default] Failed to store image 89fdd79b-bf9b-42f8-9b69-c245ce53f945 Store Exception unable to receive chunked part: OSError: unable to receive chunked part
ok that's a new approach to me so I'm not sure how it's configured out the gate however I know they contributed to my PR which introduced the OpenStack remote approach - I've only got experience with Kolla (and I have infra people who actually spin it up, I just interact with it).
OK, so it works via the CLI - same credential and endpoints etc and it's just the packer approach that's failing - as it has been mentioned this isn't actually image builder itself because by this point it's the OpenStack packer plugin doing the work.
It may be worth reaching out to them in the issues of their helm repo as it could be a simple configuration change that may solve this.
"It may be worth reaching out to them in the issues of their helm repo as it could be a simple configuration change that may solve this"
Who Hashicorp or Vexxhost?
Theese are the configs:
conf:I can change those, let me know
glance:
DEFAULT:
log_config_append: null
show_image_direct_url: true
show_multiple_locations: true
enable_import_methods: "[]"
workers: 8
cors:
allowed_origins: "**"
image_format:
disk_formats: "qcow2,raw"
oslo_messaging_notifications:
driver: noop
{
"source_image": "",
"networks": "",
"flavor": "",
"floating_ip_network": "public",
"image_name": "ubuntu-2204",
"image_visibility": "public",
"image_disk_format": "raw",
"volume_type": "",
"ssh_username": "ubuntu",
"kubernetes_version": "1.28.7"
} Is it something to do with the image disk format?
The image disk format should be fine - this is what I use in my build.
{
"kubernetes_cni_semver": "v1.3.0",
"kubernetes_cni_deb_version": "1.3.0-1.1",
"crictl_version": "1.29.0",
"kubernetes_semver": "v1.29.2",
"kubernetes_series": "v1.29",
"kubernetes_deb_version": "1.29.2-1.1",
"extra_debs": "nfs-common",
"image_name": "",
"source_image": "",
"networks": "",
"flavor": "",
"attach_config_drive": "true",
"use_floating_ip": "true",
"floating_ip_network": "",
"security_groups": "",
"image_visibility": "public",
"image_disk_format": "raw",
"use_blockstorage_volume": "true",
"volume_type": "",
"volume_size": "12",
"qemu_binary": "",
"disk_size": "",
"output_directory": ""
} it looks like there is fix by them
trying the above version
It looks like is working
2024/04/08 16:17:19 packer-plugin-openstackv1.1.2x5.0linuxamd64 plugin: 2024/04/08 16:17:19 Waiting for image creation status: saving
2024/04/08 16:17:21 packer-plugin-openstackv1.1.2x5.0linuxamd64 plugin: 2024/04/08 16:17:21 Waiting for image creation status: saving
2024/04/08 16:17:23 packer-plugin-openstackv1.1.2x5.0linuxamd64 plugin: 2024/04/08 16:17:23 Waiting for image creation status: saving
It looks like is working:
2024/04/08 16:17:19 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 16:17:19 Waiting for image creation status: saving
2024/04/08 16:17:21 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 16:17:21 Waiting for image creation status: saving
2024/04/08 16:17:23 packer-plugin-openstack_v1.1.2_x5.0_linux_amd64 plugin: 2024/04/08 16:17:23 Waiting for image creation status: saving
@mboersma Does CAPZ currently use image-builder to build Flatcar based images and make them available for others to use?
You can build Flatcar images for Azure with image-builder (as you know), but the CAPZ team only publishes reference images for Ubuntu and Windows to the Azure Marketplace.
Right, nothing I can copy then 😆
I find the whole image gallery / image templates / marketplace images thing in Azure really confusing 😅
We want to switch to publishing just to shared image galleries, which would be somewhat simpler (and is also our recommended path for users in our docs). But not there quite yet.
Yeah, that's what we're doing. 🙂
I'm in the process of switching our source image from being one from another shared gallery to a marketplace image but I was having trouble with it being blocked because billing reasons.
I've finally found the list of approved images which I think solves my problem. 🙂
I'm just now confused as to why Flatcar has 4 different offers on the marketplace and what the difference is between them (see )
I'm really not sure--the Flatcar team does their own publishing (although not specifically for Cluster API).
I can dig around and try to find out. Also maybe @Mateusz Gozdek (invidian) or @Jeremi Piotrowski knows?
There's no rush. I've just successfully built and published based on the corevm offer. I just need to test it out now to make sure it actually works 😆
It would be nice to know if there are differences though.
What's the difference between an Image and a VM image definition?
Image-builder produces both but I'm not sure what is used for what 😕
I'd like to check some of my understanding with Azure...
With image-builder we create a "managed image" from the VM in our subscription . When the destination is a community gallery this "managed image" is used to create the "version" within the Image definition of our gallery. Is that correct?
If so, do we still need the "managed image" after we've published the version to the gallery? If its safe to delete, is there a way to do this with image-builder?
I think your understanding is correct. As far as I can tell (mostly by having read the docs, not actually experimented), you can create image versions and then delete the managed image since the version is used to provision.
I found this post too that suggests you can do this but it might prevent expanding replication to other regions? Or maybe not. .
That's fine, we handle the replication as part of the image-builder run. As long as we don't want extra regions after it's built we're ok. (and we could always just re-build if that was the case)
Now, any idea if image-builder / Packer can automatically clean this up?
I did come across this issue which seems to suggest that it is meant to be removing the managed image and was broken at one point but then fixed. But I dont see this happening in my environment 😞
Also in image-builder we're still using the 1.x version of the Azure packer builder plugin, not 2.0 yet.
Ok. It's not a huge problem. I was just trying to figure out what was going on in our account. I'm not too bothered with the Images being left behind. Maybe when we finally upgrade they'll magically start being cleaned up and I don't need to think about it 😄
⚠️ Currently all PRs are failing due to an issue with photon-4 packages not matching known keys. This is causing the pull-ova-all tests to fail for everyone.
If anyone happens to know what might have happened to cause this and how to fix it that's would be a huge help. If we don't track down what the problem is in the next couple days we'll remove the photon-4 OS from the test to unblock PRs while we figure out the problem. (Note, this doesn't seem to effect photon 3 or 5)
Issue:
I'd like to better understand how Packer make use of SSH keys when building image with image-builder and I have a couple questions I'm not clear on:
Hi image-builder maintainers! I wanted to know if there's a tentative date for the next release of image-builder v0.1.26.
I want to get https://github.com/kubernetes-sigs/image-builder/pull/1438 merged in first now that it’s unblocked by the failing test then I think we should be good to go.
hello there, back into capi world i'm trying to build flatcar qemu image from the repository but failing miserably 😞 the latest log i'm having is:
2024/04/19 10:52:11 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/04/19 10:52:11 [DEBUG] Detected authentication error. Increasing handshake attempts.tested with plain ssh connection and same behavior 😞
2024/04/19 10:52:18 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/04/19 10:52:18 [INFO] Attempting SSH connection to 127.0.0.1:2686...
2024/04/19 10:52:18 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/04/19 10:52:18 [DEBUG] reconnecting to TCP connection for SSH
2024/04/19 10:52:18 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/04/19 10:52:18 [DEBUG] handshaking with SSH
2024/04/19 10:52:18 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/04/19 10:52:18 Keyboard interactive challenge:
2024/04/19 10:52:18 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/04/19 10:52:18 -- User:
2024/04/19 10:52:18 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/04/19 10:52:18 -- Instructions:
2024/04/19 10:52:18 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/04/19 10:52:18 -- Question 1: Password:
2024/04/19 10:52:20 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/04/19 10:52:20 [DEBUG] SSH handshake err: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey keyboard-interactive], no supported methods remain
thats a very old ssh version and you may have protocol issues because of that
or i suggest running the build from a docker container with a more recent ssh client
how would you build from a docker container ? that's an interesting point of view
if you need qemu then run privileged or pass through /dev/kvm
There's currently nothing on the agenda for todays image-builder office hours so unless someone has anything they'd like to bring up I suggest we skip for this week.
I do think it's time for us to do a new release though now that we're unblocked by the failing tests so if anyone has any PRs they'd like to get merged in before we do the next release please speak up now 🙂
Hi,
I am creating windows image for OCI. Below is the PACKERVARFILES specification. But not able to build Windows image. Is there anything missed in this configuration.
{
"buildname": "windows",
"baseimageocid": "ocid1.image.oc1.eu-frankfurt-1.aaaaaaaaiwwre36icxfiivmgqlfdbrjm67igscbikjq4k2luhbjgcwyxiywa",
"ocpus": "128",
"shape": "BM.Standard.E4.128",
"region": "eu-frankfurt-1",
"compartmentocid": "ocid1.compartment...",
"subnetocid": "ocid1.subnet.oc1..",
"availabilitydomain": "DDJb:EU-FRANKFURT-1-AD-1",
"userocid": "ocid1.user..",
"fingerprint": "af:66:ce:6b:63:d1:ef:99:97:43:50:36:35:f2:71:f9",
"tenancyocid": "ocid1.tenancy....",
"keyfile": "~/.oci/ociapikey.pem"
}
/usr/bin/packer build -var-file="/home/ubuntu/image-builder/images/capi/packer/config/kubernetes.json" -var-file="/home/ubuntu/image-builder/images/capi/packer/config/windows/kubernetes.json" -var-file="/home/ubuntu/image-builder/images/capi/packer/config/containerd.json" -var-file="/home/ubuntu/image-builder/images/capi/packer/config/windows/containerd.json" -var-file="/home/ubuntu/image-builder/images/capi/packer/config/windows/ansible-args-windows.json" -var-file="/home/ubuntu/image-builder/images/capi/packer/config/common.json" -var-file="/home/ubuntu/image-builder/images/capi/packer/config/windows/common.json" -var-file="/home/ubuntu/image-builder/images/capi/packer/config/windows/cloudbase-init.json" -var-file="/home/ubuntu/image-builder/images/capi/packer/config/goss-args.json" -var-file="/home/ubuntu/image-builder/images/capi/packer/config/additionalcomponents.json" -color=true -var-file="/home/ubuntu/image-builder/images/capi/packer/oci/windows-2022.json" -var-file="/home/ubuntu/image-builder/images/capi/oci.json" packer/oci/packer-windows.json
Error: Failed to prepare build: "oracle-oci"
3 error(s) occurred:
* 'fingerprint' must be specified
* 'securitytokenfile' must be correctly specified. did not find a proper
configuration for key id
** 'key_file' must be correctly specified. did not find a proper configuration
for private key
Got resolved. Added missing items in packer-windows.json
It’s a public holiday in the UK tomorrow so I’ll be skipping the office hours. There’s currently nothing on the agenda so feel free to skip if nothing comes up. 🙂
any idea why when i built a flatcar image , the ssh pub key isn't populated when used with CAPI ?
Hello All,
the image-builder works wonderfully,
Now, I know this is just for build images with pre-built k8s.
Is there something similar but with pre-built wordpress?
You would deploy a kubernetes cluster and then run WordPress. https://bitnami.com/stack/wordpress
Hi....
yes, but i need this as a openstack image
already have it as on a k8s cluster
If you need an openstack image you could look at
Or you could use packer directly for openstack
I was planning to cut a new release today as it's been a while since the last and we've had quite a few PRs merged since then. Any PRs that people would like to try and get in before I do?
ok, I'm going to start a new release now.
@Drew Hudson-Viles
tide Pending — Not mergeable. Needs approved, lgtm labels.
It's ok. I find it weird that lgtm is still needed with approve 😆
laugh-cry
tide Pending — Not mergeable. Needs approved label.
So even though I approved, I needed to do /approve... nice, nice. 😄
yeah, tide doesn't use the GitHub PR status. It keeps its own state based on comments / labels
It... frustrating sometimes 😆
Image-builder v0.1.26 is now available:
Thanks to all contributors! 🎉
@mboersma do you still want to talk about Ubuntu 24.04?
Sure, we can move that to today, but it's a short topic.
@Drew Hudson-Viles are you joining? You have an item on the agenda
Just wondering about the impact of this in IB
amazon-ebs.flatcar-stable: TASK [providers : Install AWS CLI v2] ************************************************************************************
amazon-ebs.flatcar-stable: fatal: [default]: FAILED! => {"changed": true, "cmd": ["/tmp/aws/install", "-i", "/usr/local/aws-cli", "-b", "/usr/local/sbin"], "delta": "0:00:01.552103", "end": "2024-05-23 14:19:56.504588", "msg": "non-zero return code", "rc": 1, "start": "2024-05-23 14:19:54.952485", "stderr": "mkdir: cannot create directory '/usr/local/aws-cli': Read-only file system", "stderr_lines": ["mkdir: cannot create directory '/usr/local/aws-cli': Read-only file system"], "stdout": "", "stdout_lines": []}
amazon-ebs.flatcar-stable:
What was the reason for removing these aws cli changes specifically?
Oops! It was to support Ubuntu 24.04 on AWS, I unified the couple of paths that used to be there to install the AWS CLI (and added gpg checksum validation). I didn't realize that would pull in Flatcar actually.
Thanks Marcus! LMK if you need help or want me to fix my own damage, I'm just waking up but I'd be glad to look into it.
I think I have it already. Haven't tested it yet but I'll get the PR up for you to take a look at
Confirmed working for Flatcar
I'm going to get a new release out with this fix today
@Drew Hudson-Viles as you're about would you mind looking at the above? 🙏 🥺
Thank you kindly sir 🙂
Image-builder v0.1.27 is now available:
This is a small release with just a couple bug fixes in it. 🙂
Thanks to all contributors!
Image-builder v0.1.28 is now available:
Thanks to all contributors!
(Trying to do more frequent, smaller releases. Lets see how this goes 😄)
With PR for license exception on packer getting rejected, whats the plan going forward?
It's unclear. Please see the discussion in our tracking issue:
For right now nothing changes as we're still pinned to the pre-BUSL version but that can't continue forever.
I have also added it to the agenda for next weeks office hours though I'm not sure there will be any solution come out of that. It'll likely be more to make the current state of things clear.
i think there is another file to clean up on ubuntu images etc/cloud/cloud.cfg.d/90-installer-network.cfg - see
i found this by trying to use the 22.04 daily build iso instead of the 22.04.04 release ISO, and you get revision 5495 vs 5741 of the subiquity snap in those different ISOs
Just catching up on the office hours... some notes:
My couple of cents:
I also don't think I agree with Fabrizio that Packer isn't a runtime dependency for us. The image we produce isn't "image-builder" the process of building is image-builder so that is the runtime in my opinion - but I'm not a lawyerThis is my understanding. Please correct me if I am wrong here. The output of the whole process (image-building) are artifacts which are later consumed by CAPX to bootstrap K8s cluster. We use packer to bootstrap VMs in providers and configure further changes and then produce the final artifacts (in case of vSphere OVA, AWS AMI, etc). The use of packer here is only for building the artifacts (or the final desired output) and not while actual cluster creation(this is what I assume is actual runtime - cluster creation). So not sure why calling packer a build time dependency or usage for producing the images is improper.
The use of packer here is only for building the artifactsI agree with this, and my reading of the BUSL license is that's all they care about as well. Hopefully we can get the CNCF to agree.
OH NO I missed this - was on an off-site all week. I am absolutely available for chatting on CAPI sysexts! @tormath1 is also a good peer for this discussion.
From all the CAPI providers I guess the OpenStack folks (CAPO) are most experienced with using sysexts at this point. They use Flatcar + Kubernetes sysext a lot for CI / Testing etc. and they really like it.
That would be interesting to see. uv does seem like a superior tool.
It's mostly a matter of supporting multiple distros cleanly (and Windows). Ideally we stick with "in-the-box" tools packaged by the distro vendor.
We haven't had many actual resolver problems with pip recently, but it's been tricky to install extra packages now that distros want you to use --break-system-packages and --user doesn't always work. Maybe uv would be more flexible here?
Hello everyone!
Is it possible to update the kernel version with image builder?
Not directly. image-builder starts from a source image that includes the kernel and all the basic tools, so for the kernel you'll get whatever comes with Ubuntu 24.04 (for example) as packaged for your cloud provider.
could i get some pointers on the right way to fix the broken azure tests? I can make the whole grub update be conditional on update-grub being installed?
As the agenda is currently empty and at least 2 of the maintainers are unable to make it today we're going to skip the office hours this week.
We don't have any topics in the agenda currently, so let's skip office hours this week. Please reach out on the Slack channel if you have any questions or need support with image-builder.
Hey! I am facing an issue when building an image - the VM gets stuck on language selection screen and it seems cloud-init scripts are not being executed. Any ideas how to fix that?
I dont see apt-get being invoked anywhere on proxmox build ()
any ideas where i should put that export DEBIAN_FRONTEND=noninteractive?
In logs i see following error "GET /api2/json/nodes/pve-01/qemu/118/agent/network-get-interfaces HTTP/1.1" 500 13, which i assume is caused by qemu agent not running on machine
Could someone please confirm that image-builder works only if there is a DHCP server to assign IP addresses automatically in proxmox case? It seems that in my case build process gets stuck due to connectivity issues as VM never gets IP address assigned
Hi! Yes, we use it in the same scenario. No issues at all with a simple image with basic Ubuntu. Given the fact that this images should be used primary with CAPI I assume DHCP is always needed. Otherwise you would need to use CloudInit to spin up VMs with already assigned IPs.
ok, thanks @Nicolò Ciraci! I have spinned up new DHCP server and i see it offered IP address to ubuntu-server
sorry for dumb question - but any ideas why this could be happening and how to debug that?
packer logs shows below:
2024/07/08 16:15:05 packer-plugin-proxmox_v1.1.8_x5.0_darwin_arm64 plugin: 2024/07/08 16:15:05 [INFO] Waiting 5s
2024/07/08 16:15:11 packer-plugin-proxmox_v1.1.8_x5.0_darwin_arm64 plugin: 2024/07/08 16:15:11 [INFO] Waiting 5s
2024/07/08 16:15:16 packer-plugin-proxmox_v1.1.8_x5.0_darwin_arm64 plugin: 2024/07/08 16:15:16 [INFO] Waiting 5s
2024/07/08 16:15:24 packer-plugin-proxmox_v1.1.8_x5.0_darwin_arm64 plugin: 2024/07/08 16:15:24 [DEBUG] Unable to get address during connection step: 500 Internal Server Error
2024/07/08 16:15:24 packer-plugin-proxmox_v1.1.8_x5.0_darwin_arm64 plugin: 2024/07/08 16:15:24 [INFO] Waiting for SSH, up to timeout: 2h0m0s
==> proxmox-iso.ubuntu-2204: Waiting for SSH to become available...
2024/07/08 16:15:27 packer-plugin-proxmox_v1.1.8_x5.0_darwin_arm64 plugin: 2024/07/08 16:15:27 [DEBUG] Error getting SSH address: 500 Internal Server Error
2024/07/08 16:15:35 packer-plugin-proxmox_v1.1.8_x5.0_darwin_arm64 plugin: 2024/07/08 16:15:35 [DEBUG] Error getting SSH address: 500 Internal Server Error
2024/07/08 16:15:43 packer-plugin-proxmox_v1.1.8_x5.0_darwin_arm64 plugin: 2024/07/08 16:15:43 [DEBUG] Error getting SSH address: 500 Internal Server Error
The packer errors are caused by the fact that your instance of Ubuntu is stuck, but I don’t see any clear issue in the logs. Are DNS reachable?
yeah, if I restart machine and and login through serial console, i am able to manually deploy qemu-guest-agent
The image-builder stack has some additional flag by which you can install additional package. Something like this PACKERFLAGS="--var 'extradebs=\"qemu-guest-agent\"'" make build-qemu-ubuntu-2204-crio ; you’ll need to adjust the command based on you linux flavour.
OK, so it seems the issue was that my laptop where i ran packer was not accessible from the temporary VM
Hey folks, I opened an issue regarding the mandatory pull-ova-all due to the upcoming changes by test-infra / to the prow CI, PTAL:
Agenda is currently empty. Does anyone have any topics they'd like to discuss or shall we skip?
Nothing in particular on my end, happy to skip it or to carry on if someone has a discussion topic.
ok, unless someone shouts up in the next 30 min I'm going to skip. Could do with a rest after work to be honest 🙂
What about the OVA removal that's happening? Worth chatting abotu that or shall we put it off until next time? I've got nothing to add just wasn't sure if a discussion was needed 😄
But yeah. I'm running on 3 hours sleep thanks to poorly child so I would appreciate not going on YouTube this week 😄
I don't have much insight into the OVA changes, but it would be good to hear from someone more involved. If we're not up for it this week, maybe we can recruit someone to summarize next time.
Hello Everyone!
I'm trying to upload an image from 1.29 and 1.30 of k8s, and I saw that the cni-plugin was upgraded from that version (before I used v1.2.0 of the cni-plugin, and from version 1.29 and 1.30 of k8s , cni-plugin updated to v1.3.0 and v1.4.0), but I get the following error when trying to update cni-plugin configuration in packer?
The error I get:
openstack: fatal: [default]: FAILED! => {"cache_update_time": 1721186207, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\" install 'kubelet=1.29.6-1.1' 'kubeadm=1.29.6-1.1' 'kubectl=1.29.6-1.1' 'kubernetes-cni='' failed: E: Version '' for 'kubernetes-cni' was not found\n", "rc": 100, "stderr": "E: Version '' for 'kubernetes-cni' was not found\n", "stderr_lines": ["E: Version '' for 'kubernetes-cni' was not found"], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nPackage kubernetes-cni is not available, but is referred to by another package.\nThis may mean that the package is missing, has been obsoleted, or\nis only available from another source\n\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "Package kubernetes-cni is not available, but is referred to by another package.", "This may mean that the package is missing, has been obsoleted, or", "is only available from another source", ""]}
4116 openstack:
4117 openstack: PLAY RECAP **
4118 openstack: default : ok=44 changed=34 unreachable=0 failed=1 skipped=190 rescued=0 ignored=0
4119 openstack:
I think you'll have to set 1.4.0 for kubernetescnideb_version at least given you're using apt.
Is there a reason to set to it to null? Looking at the config I think that should be 1.4.0
Hey everyone! quick question is this PR still functional ? In other word, is building arm64 image on mac m1 supported by image-builder?
Hello, I am the creator of the PR. This PR was done aa year ago and will probably not work anymore because of the now inexistent links to the k8s binaries. Still, with some changes to the urls, it might work on a native arm64 Linux box. If I remember correctly, I know Mac M1 does not actually support kvm qemu-arm64? if that is the case, you need to use a Linux arm64 box to try out the PR.
We can continue this discussion on the PR github thread, to let me know if you are constrained to running this on Mac M1. Fortunately, I also own a Mac M1 and I can check it out too.
does the images for kubevirt/qemu support EFI/Secure Boot? I've tried to boot a few of them and failed
follow up question: is it possible to create an image that supports both, bios and efi, as well as secureboot? I see cloud images like ubuntu support that
@Marcus Noble /@mboersma Need help with merging PR. This is regarding Ubuntu-24.04 support for vSphere environment. All the CI jobs are successful. It would be great if we can get this in before end of this month (next two days) - disabling of pre-submit CI for all OVA issues
cc: @chrischdi @rajas
Thanks @mboersma @Drew Hudson-Viles for the approval and merge in time...
Hey folks,
question-block regarding the build process how it's implemented in image-builder.
not really related to image-builder project itself - but i see a large user base here --> more distributed knowledge 🤞
What's the main reason why image-builder doesn't build a generic raw and convert it later into a qcow2 (openstack-square ), an ami (aws ) or vhdx (azuretent )?
just asking because:
qemu-img is able to convert imagesOr is it for historical reasons, that no generic build was implemented as in the beginning of image-builderit was easier to build on the target cloud-providers directly?
I think one point is e.g.: For azure you may want to have different config for e.g. cloud-init or tools installed e.g. the azure cli, compared to when running on openstack.
There are some ansible tasks which are run depending on the target which gets built.
thank you Christian, yes that's a valid point.
And if i take a look at the provider specific ansible tasks, they seems to be valid for an image-builder perspective (like you said, having azure-cli in there or the ssm agent for AWS.
but if you do not really need these, do you know any limitiation when it came to conversion via qemu-img convert
Nope, not really. If the image still has everything you need, I don’t see a reason why it should not work, except maybe hypervisor specific things at the end, but if it runs it runs 🙂
Note: I’m not very familiar with aws and azure image building.
🤔 I wonder if this is a good approach for us to try out switching to using systemd sysext. Instead of trying to update all existing targets we create a new one specifically for sysext that builds a raw image with optional conversion at the end for each cloud provider. That convert stage could also possibly layer on provider-specific sysext packages like azure-cli.
Giving my 2 cents, this is how Flatcar is actually built: there is a generic image build then it's converted to cloud providers VM images with specific bits (image format, OEM tools provided as sysext, etc.)
Image-builder v0.1.31 is now available: https://github.com/kubernetes-sigs/image-builder/releases/tag/v0.1.31
Thanks to all contributors! 🎉
PTAL, there's a bug in acquiring GCP projects that is causing leaked CI projects
Has anyone been able to successfully build the latest 3975.2.0 release of flatcar Flatcar for CAPV using make build-node-ova-vsphere-flatcar?
I'm getting stuck at the "Waiting for IP..." stage but currently don't have access to vcenter to debug the issue 😞
(The exact same image-builder setup works fine with Flatcar 3815.2.5 so I suspect something has changed in the new release that we need to handle)
what could help for debugging (also in future) is try to get a screenshot of the vm to the artifacts using govc vm.console -capture screen.png my-vm
I don't currently have access to vcenter. Needing another team to take a look for me.
I’m planning to build an image next week for CAPV with v1.31 when its released.
I don’t currently have access to vcenter. Needing another team to take a look for me.I was thinking of having this command somewhere in the image-builder pipeline for failures 🙂 to always get this in case of an error, also later when ci on vsphere is back.
or at least being able to toglge it on manually on a PR or so if it can’t be always on 🙂
But thinking about it, in my CI pipeline I don't actually have any way currently to save artifacts like that so would need a bit of a rewrite on my end anyway
Still, extra debugging of failures is always helpful
Update: Looks like the latest Flatcar stable release has a considerable slower boot time (by 1m30s) that is causing problems with the default bootwait time we have set in image-builder not being long enough.
To work around this issue until Flatcar has worked out what is the cause of the slowdown you can set the bootwait Packer variable to something like 120s in your user provided vars.
120s was not enough over here, did use 180s instead (~145s would have been good enough).
Most time is for:
Yeah, exactly that. I found 120s worked for me ( and I have a PR to set that) but it doesn’t hurt to have a little more possibly.
I did also have trouble with the boot command itself failing too.
I don’t understand why it’s failing in the latest flatcar release though
which one?
With your PR I’m now at the ansible stuff (stable channel / 3975.2.0)
I mean I don’t understand what changed in the latest flatcar that has caused what was there before in image builder to fail
There’s some updates from the flatcar team in that issue I opened
☝️ Agenda is currently empty. Anyone got anything they'd like to discuss or shall we skip?
I don't have anything in particular, happy to skip unless someone adds something to the agenda.
Sorry for the late reply, yes I'm happy to skip. I'm actually helping my brother move today anyway.
Can someone please help me sanity check if an assumption I previously had about AMI builds is actually incorrect?
In my image-builder pipelines I'm setting the FLATCARVERSION env var to specify what Flatcar image to base my build on. This works as expected on Azure thanks to setting the distributionversion in the packer vars but I don't think this env var is used when building AWS AMIs. 😞
For AMIs we use the following AMI filter:
Flatcar{{env FLATCAR_CHANNEL}}which just seems to take into consideration the flatcar channel.I'm pretty sure updating the filter to the following would fix things:
Flatcar{{env FLATCAR_CHANNEL}}{{env FLATCAR_VERSION}}**
😞 Thanks for confirming. Guess I'll get a fix PR up shortly then
Image-builder v0.1.32 is now available:
Thanks to all contributors! 🎉
Image-builder v0.1.33 is now available:
Note: This is a small fix release to resolve the above issue with the latest Flatcar ISO release. There's not rush to upgrade to this if you don't use the Flatcar ISO. 🙂
In ubuntu 22.04, we can find file /etc/default/grub, and there is line like this:
GRUB_CMDLINE_LINUX=" apparmor=1 security=apparmor"How is it generated?
If you have added the gpu role as part of your nodecustomrolespre and enabled the blocknouveau_loading variable (set to true ), nouveau driver will be blocklisted and kernel will not load the module. This disabling is done as part of gpu role.
I will try it out.
BTW, how about "inteliommu=on"?
Can this trick be use together with yours?
- name: Enable IOMMU
ansible.builtin.lineinfile:
path: /etc/default/grub
regexp: '^GRUBCMDLINELINUXDEFAULT="((?:(?!inteliommu=on).)**?)"$'
line: 'GRUBCMDLINELINUXDEFAULT="\1 intel_iommu=on"'
backup: true
backrefs: true
notify: update-grub
Thanks, yes, I'm able to add it as follows:
- name: Enable IOMMU
ansible.builtin.lineinfile:
path: /etc/default/grub
backup: true
backrefs: true
regexp: ^GRUBCMDLINELINUX="((?:(?!inteliommu=on).)**?)"$
line: GRUBCMDLINELINUX="\1 inteliommu=on modprobe.blacklist=nouveau"
when: ansible_distribution == "Ubuntu"
The above one will blacklist/block module from initially loading the module. However, if some other driver or something else tries to load it, it will still get loaded.
Someone has luck to create EKSA vSphere cluster with ubuntu 22.04 at all?
What's the better way to add?Depends on whether you are likely to have a scenario where you have someother entity/driver trying to load nouveau kernel package. If yes and you wish to always make sure it does not load the package, you could add install nouveau /bin/false along with modprobe blocklisting. If thats not the case or does not matter if later package is loaded, what you have added should work.
I'm not going to be able to attend this morning, but we probably have some things to discuss. If we don't have anything on the agenda and don't meet, I'll be here most of the day to discuss whatever on Slack.
@mboersma /@Marcus Noble Can we close this issue since there has been no response?
Yeah I think so. Can always be reopened if needed
Hey folks, I'm trying to use Image Builder to build an AWS AMI that will be used for an "offline" (no Internet access) K8s cluster. I need to load some additional container images into the AMI. To do that, I made the following changes to the files in my cloned copy of the Image Builder repo:
I vaguely recall some discussion about airgapped environments a few months ago. I can't remember the details though unfortunately but I you might be able to search this channels history.
Hmm, looks like the airgap discussion was about image-builder itself, not about building images for airgapped environments. 😕
No worries! I appreciate the response. Ignoring the airgap portion for a moment, do the changes I mention above sound correct for preloading additional container images into an AMI?
I think so. But to be honest I'm not entirely sure as never used it myself.
Please report back how you get on! I'd be keen to hear if there was anything else.
I will almost certainly write a blog post with details once I figure this out! ✍️
Here's the solution. All changes need to go into images/capi/packer/config/additionalcomponents.json. (No changes need to be made to the Ansible roles.) There are three changes to make:
additionalcomponents to "true".
The resulting AMI will have all the "base" container images as well as the additional container images you specified.
Cross-posting this in case anyone else here has tried building the latest Flatcar on Azure yet. (Maybe @mboersma 🙏 )
☝️ @Drew Hudson-Viles if you're able to would you mind looking at this too?
😩 To make things worse, ubuntu:latest now points to Ubuntu 24.04 which apparently doesn't have a qemu package available to install. Today is not going well for me 😅
Do y'all think we should pin the base ubuntu image to the previous (22.04)?
Yup, figured that out. 🙂
Then hit an issue where 24.04 now has the ubuntu user as 1000 and imagebuilder gets created as 1001 which broke all my permissions 😩
I'm not having a great morning so far.
I need a holiday I think 😅 This week has been A LOT
But yeah, we might want to think about pinning the version as anyone building on top might experience similar issues. I suspect we should at least have it upgraded in a new release (maybe we pin to 24.04 and have that as a new release?)
I'll try and get a PR up shortly. Currently trying to fix the above flatcar problem first as it's causing us problems 😞
Oh fun! I've also just learnt that Goss failing doesn't actually result in the make command failing 🤦♂️
🤦♂️ The PR also needs the /lgtm label. (No idea why when it has the approve 🤷 )
Oh fun! I've also just learnt that Goss failing doesn't actually result in the make command failingRealized that couple of days back when working on our systems using IB... goss validations failed but build succeeded.... wanted to raise a issue but forgot...
🤦♂️ i also forgot with everything else I was dealing with yesterday. I’ll create one today!
🤦♂️ And I've just noticed thanks to writing up this issue that my CAPA builds have Goss failing
☝️ I don't really know much about Goss so not sure I can work on fixing it, at least not right now anyway, so if anyone is able to help I'd very much appreciate it! 💙
Edit: A few minutes searching and I think I've figured it out 😆
You're too ninja fast for me. I log in and you've both found a problem and may have a fix 😄
I don't have an Amazon env to build these in unfortunately either. But I can take a look from the OpenStack perspective if needed still.
It looks like its Flatcar specific. It's just I had logs saved for my previous AMI builds 🙂 (Edit - the failing goss test is flatcar specific)
Anyway, I have a fix PR incoming 😉
(I'm having a MUCH more productive day so far than I did yesterday 😆 )
I also feel like I might finally be starting to get a grip on how most of image-builder fits together 🤣
Yeah I feel like I'm finally there now. The attempt to switch to HCL (which I've put on hold until we know where we stand with packer) made me learn a significant amount about the structure and more importantly the beast of a Makefile.
Oh the Makefile is still a "here be dragons" thing for me. I can tweak it a little but I wouldn't feel comfortable making any large changes to it 😅
Hi Team, Trying to understand about previous work or future plan for supporting fedora based image building for CAPI. Could some one please help.
What are you wanting to know exactly?
There is no dedicated support to any OS or provider and we rely on community contributions. Are you looking to help out with Fedora support?
I see that currently image-builder does not have support for building fedora images, So looking forward to understand about plan for adding fedora support or how can I add the support
First place to start would likely be to open an issue on the repo stating what versions of Fedora you're looking for and against what providers.
I know we have CentOS support for AWS at least so that might be a good place to look at what might be needed for adding Fedora.
Sure, Thanks for the quick help. I will create an issue and possible try to add fedora support as well.
@Marcus Noble By any chance do you have any doc or reference or PR on what are the things needed to be done to add a new OS support. Just looking for some reference
We dont have any docs unfortunately (we prob should 😔 ) but I could likely find an example PR for you. Let me see what I can find…
Here’s a pretty large PR adding a totally new OS distro: https://github.com/kubernetes-sigs/image-builder/pull/1192
And here’s a couple small ones just adding newer versions of existing distros:
https://github.com/kubernetes-sigs/image-builder/pull/1500
https://github.com/kubernetes-sigs/image-builder/pull/1476
It just depends which version of Kubernetes you want. You can find a list on the Kubernetes releases page.
It likely won't be supported as image builder is configured to use the new repos which iirc started at 1.27 or 1.28. Also, they likely aren't maintained any more so it wouldn't get the latest updates or security patches.
basically what are the supported version when using the image builder?
Yeah that's the list. The only ones we can really support are the same as are listed in that link as anything further back is not supported by the Kubernetes community
Anything to discuss this week? I know we have one ongoing item but do we need to catch up today on this?
Hi, we are currently trying to generate ubuntu 22.04 image with kubernetes v1.28.0 but keep running into problems. Can somebody please help?
I’m no expert on qemu but this line looks relevant:
2024/09/10 09:37:30 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/09/10 09:37:30 Qemu stderr: Unable to init server: Could not connect: Connection refusedMaybe some sort of permission or connection issue? I haven’t seen this before myself.
im able to build ubuntu 20.04 with k8s v1.28, but run into error when its ubuntu 22.04 with k8s v1.28
Oh that’s weird. I’m not aware of anything that might be different unfortunately 😔 Hopefully someone else has some suggestions. I’m currently travelling so not able to test myself currently but @Drew Hudson-Viles might be able to if he’s about this week? 🤞
I've managed to build this myself using the same command. the only difference is I didn't run it with sudo. I get a different error when running with sudo as I don't have the packages available in the PATH. It is definitely worth checking if this is the source of the problem though. But I can confirm the process definitely works.
I've dropped a commend in the issue anyway.
Might be worth trying with the container image too is possible
even without sudo it gets stuck at qemu: Waiting for SSH to become available...
I'd recommend launching the vnc and checking what's happening on the VM it's launched in this case. I've ran through this on my side and I can confirm it's 100% working.
The other option would be to try using the Docker image that's supplied to run a build.
we were able to generate the image now, Thanks!
something wrong with the dev environment we were working on, tried it from scratch on a new machine and it worked!
Image-builder v0.1.35 is now available: https://github.com/kubernetes-sigs/image-builder/releases/tag/v0.1.35
Thanks to all contributors!
Hello folks, I have opened a PR to fix a Goss bug that caused our RHEL 8 and 9 qemu/raw builds to fail due to validation errors. I have updated the PR description with the necessary information about the bug. I would greatly appreciate some reviews on it, thank you!
Agenda empty - shall we still meet or skip? Anyone got anything to discuss?
I think I'm going to skip. I could do with a rest as not been able to focus at all today 😩 If anyone decides to sync then ping me and I'll try to join.
Sounds good to me, sorry to chime in late. I don't have anything for the agenda but if anyone does, I'll check in a bit and start the meeting if so.
I'm actually getting all of this set up right now. I basically wrote a custom python script to generate packer vars including the correct ECR provider version and then wrote a role to install the provider.
I elected to put the logic inside a Python script because I'm not good enough with Ansible to navigate the Github releases API with it and choose the appropriate version.
I’m attempting to get it added into image-builder behind a Boolean toggle. Getting a bit tangled up with the ansible vars but hoping to have something today.
Well... hopefully I get it working today 🤞 We're currently blocked on an upgrade without it so I need to figure something out.
Heh...I know the feeling. We've been blocked on getting upgraded past 1.28.3 due to the whole VMWare acquisition led to the discontinuance of CAPI images. I'm trying to get a whole build pipeline automated so we can quit worrying about winding up outside the supported version window.
Yeah, we build all our images too. But mainly because we wanted consistency across different providers that we support. It's constantly a work in progress it seems 🙈
Ok, at first glance, my changes are looking good:
/ # ls -la /host/opt/bin/ecr-credential-provider
-rwxr-xr-x 1 root root 477197 Sep 27 09:08 /host/opt/bin/ecr-credential-provider
/ # ls -ls /host/var/usr/ecr-credential-provider/ecr-credential-provider-config
4 -rw-r--r-- 1 root root 337 Sep 27 09:08 /host/var/usr/ecr-credential-provider/ecr-credential-provider-config
/ # cat /host/var/usr/ecr-credential-provider/ecr-credential-provider-config
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- name: ecr-credential-provider
matchImages: ['.dkr.ecr..amazonaws.com', '.dkr.ecr..amazonaws.com.cn']
defaultCacheDuration: "12h"
apiVersion: credentialprovider.kubelet.k8s.io/v1
env:
- name: AWS_PROFILE
value: "default"
/ # ps ax | grep /opt/bin/kubelet
3225 root 0:02 /opt/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cloud-provider=external --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --healthz-bind-address=0.0.0.0 --v=2 --image-credential-provider-config=/var/usr/ecr-credential-provider/ecr-credential-provider-config --image-credential-provider-bin-dir=/opt/bin
Just need to get another team to test it out and make sure it actually works with ECR then I'll get a PR up.
I did ask a question on the PR after it got merged. It's a bit of a nit, but I figure it's probably better to ask.
I wasn’t actually aware it was available at that URL. I struggled to find docs about it. If that URL works the same then yeah it makes sense to use that instead. Do you want to open a PR for it?
This is the first reference I could find for it related to sigs:
I was considering pushing out a new release with this change but I think I'm going to hold off until there's more changes that need releasing. Just, FYI 🙂
Image-builder v0.1.36 is now available:
This include the above change I was discussing to optionally include ecr-credential-providers.
I try to build vSphere image with such command:
IB_OVFTOOL_ARGS="--allowExtraConfig" make build-node-ova-vsphere-ubuntu-2204-efi
Same issue exists for the command:
image-builder build --os ubuntu --os-version 22.04 --hypervisor vsphere --release-channel 1-29 --vsphere-config vsphere-34.json --firmware efi
Looks like you might be able to force it to use a specific IP address at boot. Not sure how we solve this permanently though.
It might be we're not waiting long enough for it to detect the new IP address from vsphere
Try setting boot_wait to 300 in your packer vars and see if that helps at all. If it does we could try tweaking it until we find a low enough value that works.
Thanks, I try following 3 ways in file packer/ova/packer-node.json:
Can you confirm that in the vcenter console it is picking up the new IP address?
Hmmmm... I thought that was how Packer discovered the IP. Maybe someone who knows more about vsphere can suggest something.
It looks like this may be caused by the behavior of the DHCP server in use.
Ah, that's one of the things that Marcus is getting at with the boot_wait.
The trick doesn't help (the IP still changed) with the image-builder build from source if put in late-commands.
Instead, if put in early-commands, it helps:
images/capi# git diff packer/ova/linux/ubuntu/http/22.04.efi/user-data
diff --git a/images/capi/packer/ova/linux/ubuntu/http/22.04.efi/user-data b/images/capi/packer/ova/linux/ubuntu/http/22.04.efi/user-data
index 095d9cef3..ee9bb9705 100644
--- a/images/capi/packer/ova/linux/ubuntu/http/22.04.efi/user-data
+++ b/images/capi/packer/ova/linux/ubuntu/http/22.04.efi/user-data
@@ -20,6 +20,9 @@ autoinstall:
# Disable ssh server during installation, otherwise packer tries to connect and exceed max attempts
early-commands:
- systemctl stop ssh
+ # Prevent DHCP release message from being sent on reboot
+ - iptables -A OUTPUT -p udp --dport 67 -j DROP
images/capi# IB_OVFTOOL_ARGS="--allowExtraConfig" PACKER_FLAGS="--var 'kubernetes_rpm_version=1.28.9' --var 'kubernetes_semver=v1.28.9' --var 'kubernetes_series=v1.28' --var 'kubernetes_deb_version=1.28.9-2.1'" make build-node-ova-vsphere-ubuntu-2204-efi
...
==> vsphere-iso.vsphere: IP address: 10.20.34.145
==> vsphere-iso.vsphere: Using SSH communicator to connect: 10.20.34.145
==> vsphere-iso.vsphere: Waiting for SSH to become available... ** the IP stays at 10.20.34.145
...
If there is side effect, what's the right way to fix the side effect?
ipsettletimout will make packer wait for specified time duration before it asks for the VM IP address.
I've got a question around image-builder-proxmox, where can I ask about the proxmox image builder?
==> proxmox-iso.ubuntu-2204: => downloaded_iso_path/c968bbbeb22702b3f10a07276c8ca06720e80c4c.isoI saw an older posting with this error and it sounded like it had something to do with the Mime-type used when uploading.
==> proxmox-iso.ubuntu-2204: 501 for data too large
Build 'proxmox-iso.ubuntu-2204' errored after 2 minutes 31 seconds: 501 for data too large
==> Wait completed after 2 minutes 31 seconds
==> Some builds didn't complete successfully and had errors:
--> proxmox-iso.ubuntu-2204: 501 for data too large
I've only used the Proxmox provider a little bit for some testing so might not be able to help but I do remember it being quite a pain to get set up.
Can you share what make target you're using and what vars you're providing (with anything sensitive redacted)
Also, do you have any load balancer or proxy or anything in front of your proxmox api that might be imposing the limit? (e.g. nginx)
I can't remember exactly but I think I ended up manually downloading the base iso into Proxmox prior to running image-builder
I also ended up manually uploading an iso to proxmox, and I thought it helped at first, but then didn't seem to. Figured I was just starting to hack at it rather than fix the actual issue.
I wonder if you might click on the link I provided, as I put a bunch of additional detail in the ticket 288.
answer:
Are you able to suggest any improvements to our documentation?
I'm thinking about it ... figured I should probably first get things working then maybe go back and add something. So far I'm thinking:
that's what i did when i fixed them; i listed out the user permissions; added the role; then listed them out again to see that they had changed
(cause at that point i wasn't sure if i had added the permissions before or not, just wasn't sure)
if i have time i'll put together a youtube video maybe, will let you know if i do
Perhaps there is an opportunity to make the process easier for folks in the future in the documentation, and perhaps in the error messages. Instead of 501 maybe "501, did you remember /api2/json?" Instead of "use of closed network connection" maybe "use of closed network connection (403)" or "use of closed network connection, see /var/log/pveproxy/access.log for more detail".
if someone forgets /api2/json, maybe it could just add it automatically ... or the URL environment variable could be checked early on to see if it has '/api2/json' and fail quick if it doesn't
also, how to specify which kubernetes version to build, the doc page () lists 4 env variables ... but i wonder if we actually just need one ... like KUBERNETES_SEMVER ... or do we need to specify all 4; when i run the build without specifying any KUBERNETES variables i don't really know what's going to happen? it will probably build the latest kubernetes, i assume? but don't know
That error message comes directly from Packer so not something we can really control in image-builder unfortunately. We could maybe do a check for the api2 path though. The example on the docs page does include the path though but maybe we need to make it clearer somehow.
If you have any suggestions on how to make those docs better we'd very much appreciate a PR! 😄 None of the maintainers of image-builder use Proxmox so we rely heavily on user contributions here.
I answered the kube version question in your other thread 🙂
Agenda currently empty. If no one has anything they'd like to discuss then lets skip 🙂
I’ll have to miss it regardless today, sorry! Taking our animals to the vet for a checkup.
Sorry, I've only just seen the time. The whole family is ill and the lack of sleep has made today just wizz by 😄
Should i be worried about these warnings in the image-builder logs? Skipped '/run/netplan' path due to this access issue
I've not come across this one myself but it may just be the path isn't available if it's skipping over. As long as the image is coming online on boot, I wouldn't be too concerned.
Thanks @Drew Hudson-Viles. The image does come online.
Where does "proxmox-iso.ubuntu-2404" come from? I mean, where is it built so I can see what's in there maybe tweak it a little. Looks like there is a bug and I was interested to look into it a bit. (Or is that the new vm being built? It said ISO so thought it might have been an iso)
==> proxmox-iso.ubuntu-2404: Error creating VM: format can only be one of the following values: cow,cloop,qcow,qcow2,qed,vmdk,raw
Build 'proxmox-iso.ubuntu-2404' errored after 3 minutes 31 seconds: Error creating VM: format can only be one of the following values: cow,cloop,qcow,qcow2,qed,vmdk,raw
I'm was hoping to troubleshoot the error maybe ... but I'm too in the dark on this one. Thank you. I'll just hang out and hope the issue 1579 gets some attention.
How can I automatically set the PACKER kubernetes versions? I feel like I'm reinventing the wheel here ... and not sure how to get the DEB version:
# parse and set kubernetes env vars
echo "PACKER_FLAGS: $PACKER_FLAGS"
export VERSION=v1.31.1
#KUBERNETES_RPM_VERSION=1.29.6
#KUBERNETES_SEMVER=v1.29.6
#KUBERNETES_SERIES=v1.29
#KUBERNETES_DEB_VERSION=1.29.6-1.1
KUBERNETES_RPM_VERSION=$(echo $VERSION | cut -d 'v' -f 2)
KUBERNETES_SEMVER=$VERSION
KUBERNETES_SERIES=$(echo $VERSION | cut -d '.' -f 1).$(echo $VERSION | cut -d '.' -f 2)
KUBERNETES_DEB_VERSION=$(echo $VERSION | cut -d 'v' -f 2)-1.1
export PACKER_FLAGS="--var kubernetes_rpm_version=$KUBERNETES_RPM_VERSION --var kubernetes_semver=$KUBERNETES_SEMVER --var kubernetes_series=$KUBERNETES_SERIES --var kubernetes_deb_version=$KUBERNETES_DEB_VERSION"
You can define your own vars JSON file and pass that in when calling image-builder.
E.g. I have a vars.json that looks something like this (with the env vars populated via shell fist):
{
"ssh_clear_authorized_keys": "true",
"kubernetes_deb_version": "${KUBERNETES_VERSION}-00",
"kubernetes_rpm_version": "${KUBERNETES_VERSION}-0",
"kubernetes_semver": "v${KUBERNETES_VERSION}",
"kubernetes_series": "v${VERSION_MAJOR}.${VERSION_MINOR}",
"enable_containerd_audit": "true",
"ecr_credential_provider": "true"
}And then set the environment variable PACKERVARFILES to the location of that vars.json when calling image-builders Make targets
Unfortunately its not quite as easy as just saying "give me Kubernetes v1.31" as the different platforms/OSs have different ways of pulling them in and we don't have the capacity in the project to keep an up-to-date mapping of those versions.
It's getting further along now. At the moment I'm seeing this message:
Error getting SSH address: 500 QEMU guest agent is not running
With proxmox build now seeing:
Error getting SSH address: 500 QEMU guest agent is not running(dhcp is enabled and works)
The packer VM couldn't access the packer HTTP server.
You need to run this in Proxmox VM (same network) for example, so you're sure that communication works
all tests showing network connectivity is good and no firewall issues; able to curl the port image-builder opens up ; gives me a 403, but that's ok, communication is good
i stopped a firewall to be completely sure, so i'll let it run for a bit with the 500 ... but i suspect the communication should happen pretty quick
still says waiting on 500, but the console isn't showing the startup screen anymore, i see automation in there so it seems to be working ... acting like its stuck on 'installing kernel' ... fingers crossed it works
things were looking good, finished this time but with an error:
2024/10/11 10:09:01 packer-plugin-ansible_v1.1.1_x5.0_linux_amd64 plugin: 2024/10/11 10:09:01 [INFO] 0 bytes written for 'stdin'
proxmox-iso.ubuntu-2204:
proxmox-iso.ubuntu-2204: TASK [Gathering Facts] *
proxmox-iso.ubuntu-2204: fatal: [default]: FAILED! => {"msg": "failed to transfer file to /home/travis/.ansible/tmp/ansible-local-350766c889qert/tmph0vrydqs /tmp/.ansible/ansible-tmp-1728662940.5550468-350781-30051397384999/AnsiballZ_setup.py:\n\n"}
proxmox-iso.ubuntu-2204:
proxmox-iso.ubuntu-2204: PLAY RECAP
proxmox-iso.ubuntu-2204: default : ok=1 changed=0 unreachable=0 failed=1 skipped=1 rescued=0 ignored=0
proxmox-iso.ubuntu-2204:
2024/10/11 10:09:01 packer-plugin-ansible_v1.1.1_x5.0_linux_amd64 plugin: 2024/10/11 10:09:01 shutting down the SSH proxy
2024/10/11 10:09:01 [INFO] (telemetry) ending ansible
==> proxmox-iso.ubuntu-2204: Provisioning step had errors: Running the cleanup provisioner, if present...
==> proxmox-iso.ubuntu-2204: Stopping VM
==> proxmox-iso.ubuntu-2204: Deleting VM
2024/10/11 10:09:05 [INFO] (telemetry) ending ubuntu-2204
==> Wait completed after 23 minutes 34 seconds
2024/10/11 10:09:05 machine readable: error-count []string{"1"}
==> Some builds didn't complete successfully and had errors:
2024/10/11 10:09:05 machine readable: proxmox-iso.ubuntu-2204,error []string{"Error executing Ansible: Non-zero exit status: exit status 2"}
==> Builds finished but no artifacts were created.
2024/10/11 10:09:05 [INFO] (telemetry) Finalizing.
Build 'proxmox-iso.ubuntu-2204' errored after 23 minutes 34 seconds: Error executing Ansible: Non-zero exit status: exit status 2
==> Wait completed after 23 minutes 34 seconds
==> Some builds didn't complete successfully and had errors:
--> proxmox-iso.ubuntu-2204: Error executing Ansible: Non-zero exit status: exit status 2
==> Builds finished but no artifacts were created.
2024/10/11 10:09:05 waiting for all plugin processes to complete...
2024/10/11 10:09:05 /home/travis/.config/packer/plugins/github.com/hashicorp/proxmox/packer-plugin-proxmox_v1.2.1_x5.0_linux_amd64: plugin process exited
2024/10/11 10:09:05 /home/travis/.config/packer/plugins/github.com/hashicorp/ansible/packer-plugin-ansible_v1.1.1_x5.0_linux_amd64: plugin process exited
2024/10/11 10:09:05 /home/travis/.config/packer/plugins/github.com/hashicorp/ansible/packer-plugin-ansible_v1.1.1_x5.0_linux_amd64: plugin process exited
2024/10/11 10:09:05 /usr/bin/packer: plugin process exited
2024/10/11 10:09:05 /usr/bin/packer: plugin process exited
2024/10/11 10:09:05 /usr/bin/packer: plugin process exited
2024/10/11 10:09:05 /home/travis/.config/packer/plugins/github.com/YaleUniversity/goss/packer-plugin-goss_v3.2.12_x5.0_linux_amd64: plugin process exited
make: * [Makefile:593: build-proxmox-ubuntu-2204] Error 1
it should be named proxmox template?, or ... you are saying the randomly named iso is the proxmox template?
first time i'm using this, not even sure what the result is supposed to be, not sure if this is a successful run or not since i got the ansible error
I would have expected something like, "Success! Generated xxx.iso".
What are you talking about?
What ISO
It should be Virtual machine template in Proxmox.
I don't see a template, I think after the ansible error it just shutdown the vm and deleted it.
==> proxmox-iso.ubuntu-2204: Provisioning step had errors: Running the cleanup provisioner, if present...
==> proxmox-iso.ubuntu-2204: Stopping VM
==> proxmox-iso.ubuntu-2204: Deleting VM
Again it seems some ssh or scp is failing in your setup.
Try to check your firewall whatsoever
or anything you need to add to ansible ssh or scp
The firewall is still disabled, so that shouldn't be the issue ... I'm not sure what else would be needed
Adding my two cents in case anyone else runs into this error and finds this thread…
I got this error after successfully building multiple images. Nothing obvious had changed, I even tried rebuilding images that previously worked.
I resolved it by updating the ubuntu-2404.json file. I changed the boot command to hard code my laptop’s IP address (where I’m running the make command).
"boot_command_prefix": "clinux /casper/vmlinuz --- autoinstall ds='nocloud-net;s=:{{ .HTTPPort }}/24.04/' initrd /casper/initrd boot ",
Thanks! I thought it was a duplicate of https://github.com/kubernetes-sigs/image-builder/pull/1586 at first but looks like it’s different Ubuntu versions. Do you know if any others need changing?
Its just an AWS issue as far as i know as CAPA does some hacky things with cloud-init
We're working on a longer term solution so that we don't need to pin to an old version.
👍 great! Are you needing a new release of image-builder putting out then?
Ok. If you’re ok for now I can hold off until Monday to get a new release out.
Unless I find a bit of time tomorrow. I might kick off the process as there’s been a few fixes the past week.
😞 Looks like I'm actually unable to push a new release out right now -
Image-builder v0.1.37 is now available:
Some notable changes:
I'd like to run image builder via a gitlab runner. I know what IP it will use, but being that its running in kubernetes its exposed via a service, or even better an ingress and I have the FQDN. Where can I specify the IP to use or specify FQDN? or otherwise tell image-builder to use the loadbalancerip when it calls back to the image-builder http port?
Submitted as a feature request
Continuing to search through docs to see if the feature already exists and I just haven't found the right env vars yet.
Image-builder v0.1.38 is now available:
⚠️ Important
This release contains fixes for two CVEs - CVE-2024-9486 and CVE-2024-9594 (see kubernetes/kubernetes#128006 & kubernetes/kubernetes#128007 for more details).
It is highly recommended to update your version of image-builder and re-build all your VM images.
Hello Image-builder maintainers, we just migrated to v0.1.38 to fix the CVEs and our Nutanix builds started failing with the error
==> nutanix: Waiting for SSH to become available...I am actively looking into it, but would appreciate if any of you folks might know what's causing the builds to act up. Maybe some hard-coded credentials needs to be removed?
==> nutanix: Error waiting for SSH: Packer experienced an authentication error when trying to connect via SSH. This can happen if your username/password are wrong. You may want to double-check your credentials as part of your debugging process. original error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none password], no supported methods remain
==> nutanix: Force deleting virtual machine...
nutanix: Virtual machine successfully deleted
Seeing the same for Vsphere ISO
==> vsphere-iso.vsphere: Waiting for SSH to become available...
==> vsphere-iso.vsphere: Error waiting for SSH: Packer experienced an authentication error when trying to connect via SSH. This can happen if your username/password are wrong. You may want to double-check your credentials as part of your debugging process. original error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none password], no supported methods remain
I'm wondering if it's this change and that uuid is being overridden.
https://github.com/kubernetes-sigs/image-builder/pull/1596/files#diff-19301ae03119dcb0a5ed81f1e5839a3a26725e486cb89a3cc3e70c1b1df0b159R160
Can you confirm your vars for me? Do you have a password set in there that configures this value already?
no we don't set the password manually, have always resorted to the builder/builder default, and now we've taken in this commit without any patches to it
Just noticed that QEMU and raw image builds also followed suit with the same failure reason
Hmmm, ok. This was tested as working so we'll have to do some digging into it.
I'm just kicking off a CAPV build in my environment to see if I'm seeing the same.
Sorry I'm phone troubleshooting at the moment but should be back at my pc later to assist where I can.
@Abhay Krishna Arunachalam Do you use the container image?
no we use a different container image which contains the same versions of components as SCL image-builder has in its ensure-**.sh scripts.
Ah ok, but same version of Packer, etc. yes?
==> vsphere-iso.vsphere: Creating VM...
==> vsphere-iso.vsphere: Customizing hardware...
==> vsphere-iso.vsphere: Mounting ISO images...
==> vsphere-iso.vsphere: Adding configuration parameters...
==> vsphere-iso.vsphere: Set boot order temporary...
==> vsphere-iso.vsphere: Power on VM...
==> vsphere-iso.vsphere: Waiting 3m0s for boot...
==> vsphere-iso.vsphere: Typing boot command...
==> vsphere-iso.vsphere: Waiting for IP...
==> vsphere-iso.vsphere: IP address: 10.10.222.89
==> vsphere-iso.vsphere: Using SSH communicator to connect: 10.10.222.89
==> vsphere-iso.vsphere: Waiting for SSH to become available...
==> vsphere-iso.vsphere: Connected to SSH!
==> vsphere-iso.vsphere: Provisioning with shell script: ./packer/files/flatcar/scripts/bootstrap-flatcar.sh
I'm not able to reproduce this at least with my setup.
Are you able to share what Make target and what the final combination of user vars and environment varibles are for you're vspehre build (with anything sensitive redacted)
Running this make target
make -C image-builder/images/capi build-node-ova-vsphere-rhel-8
Can you check if setting ssh_password to something yourself allows it to build or not?
kicked off a test build setting the password to a hardcoded string other than builder
Quick question, won't we also need to change this and other locations since the password is not builder anymore?
Oh crap you might be right! I had no idea we hardcoded the password in Ubuntu like that!
but still, these files will be served and the user in the autoinstall will be created only after the SSH communicator goes through and then access the autoinstall files right? So it shouldn't affect the initial SSH connection?
Okay never mind, the boot command is typed before the SSH connection so I think it must definitely look at the builder user being created in the autoinstall file
@Abhay Krishna Arunachalam what happens if you set the password back to builder in your vars?
Running a build with builder and a separate one with hello. Will keep the thread posted
I'm curious how it worked in your case though, becuase it seems the SSH password should be the same as what's being created during autoinstall, but builder and the randomly generated UUID would be a mismatch
I use flatcar which uses ignition rather than the user data
ah I see. If I kick off a presubmit on image-builder through some dummy change, that would kick off an Ubuntu OVA right?
In terms of a fix, I think the UUID thing is internal to Packer so it's probably never printed or returned
so I think what we need to do is to generate it outside of Packer (think Makefile or script) and replace in the JSON as well as the user-data
was the pull-ova-all removed from Prow? I think that might have caught this
tried this on a linux box
$ uuidgen
04c84fad-871c-4bdf-9307-4ab8f16e5993
Update: the build with password set to builder passed, but the hello one is stuck waiting for SSH
Ah I see. Does the fix I suggested above make sense/sound feasible?
@Abhay Krishna Arunachalam I just saw your PR! Thank you so much! I’ll take a look at it hopefully later when I’m able to (currently travelling to a conference). Just wanted to check - have you tested if the change works when building a Flatcar image too?
so flatcar already has a mkpasswd command wired into a sed command, which i chose not to touch
You legend, I was about to start looking at the issue this morning! 😄 I'll do some testing on a few things my side and give it my stamp then if all is well from the thing I can test.
🎉 Great! Hopefully I can get it reviewed this morning and get it merged in. I really appreciate the effort!
Thank you all, appreciate it! I'm also going to test it out on my end since we have a Nutanix/QEMU/Raw/Vsphere testing bed in our CI
Perfect. I can do the QEMU, Raw and OpenStack to make sure all is working on that side with this change too. (Never hurts to have a couple of perspectives with regards to the QEMU & RAW).
I'm having a couple issue building QEMU atm, not sure if the envsubst is working as expected in terms of the file it outputs (still contains the $ENCRYPTEDSSHPASSWORD var) - I am flipping between work and this though might be missing something there 😉
Also we should probably consider gitignoring the generated files - i'll stick a note in the issue for that though.
yeah I had planned to put them in the gitignore too
Also I might just replace envsubst with sed, since it requires you to have the gettext package installed
Either replace with sed or have the binary downloaded into the local bin directory as part of the deps.
sed works, tested locally
ideal solution would be to migrate the entire image-builder project to packer hcl format so we can use all the built-in function and stop using this kind of hacks 😉
i don’t know if there is plan for that
in all cases the above approach would need some additional modification for platform who are using builder who need to inject userdata directly inside packer config
Migrating to HCL is a massive undertaking unfortunately and no one has offered / been able to invest the time into doing it. And as we are sorely lacking in testing its risky to make such a large change to the project as we can't be sure we don't break someones use case. 😞
We do want to do it though. It's just not easy 😞
yes i totally understood, but for sure one day in the future we will have no more choice
I did actually test doing just the OpenStack builder and it's MUCH nice in the HCL format but yeah, huge task to do it for just that one - the whole project would be a lot of work - I'd like to do it but so many edge cases and "hacks" would have to be considered
PR looks good to me. I'll leave with @Drew Hudson-Viles to review as he's in a position to test it 🙂 I'll add my lgtm
Thanks a lot for the quick review!! blod-tada I've put the PR on hold until I have had a chance to test it comprehensively on my end. But I should be able to get it merged today and y'all can then cut a release.
Also realized I need to fix it for Nutanix as well, since currently we have the userdata as a hardcoded base64 string in the packer config file (example), which resolves to
#cloud-config
users:
- name: builder
sudo: ['ALL=(ALL) NOPASSWD:ALL']
shell: /bin/bash
chpasswd:
list: |
builder:builder
expire: False
ssh_pwauth: True
Fixed Nutanix by adding a static cloud-init template which is base64-encoded and set as the user_data string during build time
Tested it on our CI and it works as expected
Not sure why the Azure presubmit is failing on the PR, is it unrelated?
I think that is currently broken and @mboersma is working on removing them from CI.
Ftr, I'm still testing things my side. I've been so busy today I've not had much time to go through it all properly I'm afraid. Last time I checked qemu didn't work but haven't synced the branch since this morning.
i tested nutanix/qemu/raw/vsphere builds on our CI and they all passed
I haven't had a chance to look since this morning and wont now until Thursday. I trust Drew can handle it though 😄 He knows more about these providers than I do
@mboersma thanks for merging this! Helped get the Azure presubmits passing on my PR
Look at all that green! 😍
Sorry for the spam! I added my image-builder PR as a patch in eks-anywhere-build-tooling and thought I'd show the CI results from the builds kicked off after the patch merged
@Drew Hudson-Viles thanks for the review! I have addressed your gitignore comment, are you good with that approach?
Yeah that's absolutely fine. I think I just missed the push where you added it 🙂
I've focussed on testing each time one comes in rather than reading them tbh. I've done a full read through and review now though so yeah, cracking work buddy!
Great! I see you've held it for other reviewers, thanks for the approval!
I should be able to take a look in the morning if no one else gets to it before then. We can then get the released pushed out. 🙂
Thanks for approving that @mboersma! I'll look into getting a release done tomorrow if I can get time in the morning.
I can do a release first thing if you want. Just need the PRs approving. 🙂
is the tag creation alone the trigger for the postsubmit?
In the past I have been able to re-trigger postsubmits by re-delivering the webhook payload corresponding to a PR merge from the repo settings. I'll admit it's not the ideal solution and I'm not sure if there are any side effects because tags are involved
Oh, Prow isn't configured to use webhooks in the repo 🤦♂️
Yeah, I need someone with more power than me it seems 😅
I don't understand why these have been so flakey recently. This is the third release in the past week that has failed initially but I haven't had any problems the previous ~1 year 🤨
Looks like someone trigger the re-run for me and its now passed so continuing with the release
Thanks for taking that on. I'm a 1 man band this week so time is short.
Yeah much better thanks. Still not 100% but functional non-the-less!
☝️ Following on from this issues (Sorry 😞) I've created an issue to track our progress towards testing as much of the providers and OSs we support in image-builder as possible. Please take a look and let me know if you have any comments or suggestions.
/cc @mboersma as I know you've been doing a fair bit of work related to this recently 🙂 Hopefully I haven't missed anything.
That's internal right? Not something we could expose to PRs on image-builder?
I'd also like to say a big thank you to @Abhay Krishna Arunachalam for doing so much work to fix the ssh password issue for non-ignition distros! 💙 Huge help and something I was personally stressing about on Monday 😅
Happy to contribute, thank you for all the support in reviewing, testing and getting it merged! k8s-heart
Hi everyone! I have a clone of the image builder and need to know the release of this clone. Where can I find this information in the repository?
What do you mean by a clone of image builder? Of the git repo? You should be able to check the git commit to see what your copy is at.
I'm having errors when building on Nutanix. Probably related to the random password thing.
I see the username and password are hardcoded in the user data here, is this expected?
and the error I get:
nutanix: output will be in this color.
==> nutanix: Creating Packer Builder virtual machine...
nutanix: Virtual machine ubuntu-2204-kube-v1.30.5 created
nutanix: Found IP for virtual machine: 10.10.141.63
==> nutanix: Using SSH communicator to connect: 10.10.141.63
==> nutanix: Waiting for SSH to become available...
==> nutanix: Error waiting for SSH: Packer experienced an authentication error when trying to connect via SSH. This can happen if your username/password are wrong. You may want to double-check your credentials as part of your debugging process. original error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none password], no supported methods remain
==> nutanix: Force deleting virtual machine...
😞 Damn! i haven't got access to a nutanix environment so not sure I can help. Maybe @Abhay Krishna Arunachalam can assist if available.
Can you share your make target and user vars?
for vSphere yes, cause the original code uses ubuntu server ISO, while i'm using ubuntu cloud image (just like Nutanix)
Oh, I'm not really sure how to handle that. Might need to wait for @Abhay Krishna Arunachalam unless there's someone else here that can help
so my make target is make build-nutanix-ubuntu-2204 but I'm setting PACKERVARFILES with 3 files:
I'll see if I can simply remove the user_data from my OS json file, it might have precedence over whats rendered by the template
I confirm it works by removing it from my custom variable file
I'm thinking about adding a make target for vSphere + Ubuntu cloud image, I'm just wondering whether people would be interested by this.
Hey sorry just seeing this. Do y'all still need me to look into something?
Hello everyone! I'm new to image builder and trying to get it to build an image using a local vmware workstation 17.6.1 install on Ubuntu 22.04 machine. I cloned the repo and attempted:
make build-node-ova-local-ubuntu-2204 PACKER_LOG=1but it always seems to get stuck here:
2024/10/19 05:45:05 packer-plugin-vmware_v1.1.0_x5.0_linux_amd64 plugin: 2024/10/19 05:45:05 [INFO] Attempting SSH connection to 172.16.175.136:22...I've noticed it seems to be setting a random password each time, and I attempt to login with the password that is set in packer-common.json, for example:
2024/10/19 05:45:05 packer-plugin-vmware_v1.1.0_x5.0_linux_amd64 plugin: 2024/10/19 05:45:05 [DEBUG] reconnecting to TCP connection for SSH
2024/10/19 05:45:05 packer-plugin-vmware_v1.1.0_x5.0_linux_amd64 plugin: 2024/10/19 05:45:05 [DEBUG] handshaking with SSH
2024/10/19 05:45:09 packer-plugin-vmware_v1.1.0_x5.0_linux_amd64 plugin: 2024/10/19 05:45:09 [DEBUG] SSH handshake err: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey password], no supported methods remain
2024/10/19 05:45:09 packer-plugin-vmware_v1.1.0_x5.0_linux_amd64 plugin: 2024/10/19 05:45:09 [DEBUG] Detected authentication error. Increasing handshake attempts.
"ssh_password": "VhS9RszwfbP1idLQ",I get the login prompt, but I can't login myself either, can't figure it out. Any help would be appreciated!
Hello, just following-up, this was a pretty basic setup, just minimal install of ubuntu with build essentials, jq, VMware workstation, etc, and the image builder. with all the defaults for the local OVA build, am I missing anything?
Thanks!
I'll be on the train to London so won't be able to make this I'm afraid
No topics on the agenda currently, but if you have anything please add it!
I've just arrived at London so can make it if we have anything to discuss 🙂
Agenda still empty. What you wanna do @mboersma?
I guess we skipped! Sorry, was waiting for something to notify that there was an item.
Hello team,
I am having a problem with AWS AMI images build with the latest release of image-builder.
The nodes seem to join the cluster without INTERNAL-IP, therefore calico doesn't run.
Any idea why this might be happening ?
As far as I'm aware we haven't had any changes recently that might effect that but I can't say for sure without more information.
{
"aws_region": "eu-north-1",
"ami_regions": "eu-north-1",
"kubernetes_semver": "v1.30.5",
"ami_groups": "",
"snapshot_groups": "",
"kubernetes_cni_semver": "v1.6.0"
}
Oh, so this isn't something thats broken in the latest release but more a case of it not working how you expected? Is that right?
Can you remind me what INTERNAL-IP is in this context? Is that a Kubernetes thing or an EC2 thing?
I honestly don't know which case it is exactly.
Basically when you run kubectl get nodes -owide the nodes should have INTERNAL-IP showing up, but in my case it doesn't as if the kubelet doesn't get the address to advertise it
Oh gotcha
🤔 Trying to recall where that comes from. I've just checked in one of my own clusters (which uses Flatcar rather than Ubuntu) and I see the IP populated.
Maybe someone who build Ubuntu AMIs can confirm if they see in the internal IP or not for nodes? 🙏
I would add, that i tried building a Flatcar AMI, but there i run into another problem, the kubelet didn't manage to start reporting that no kubelet confing not found
This might be a very unlucky case of things not working for me somehow.
I just checked my vars from back when we did build ubuntu (months back now) and I noticed that I have this var defined:
"kubernetes_cni_deb_version": "**"but I don't have kubernetescnisemver. Are you needing that specific version of the cni?
No necessarily, i was trying to have the latest versions of network related things, hoping it solves the problem
Can you try it without that set and see if that changes things? Would be good to rule it out at least.
Maybe try setting "kubernetescnideb_version": "**" instead
Actually, while I think about it - we are talking about non-EKS right?
Building with "kubernetescnideb_version": "" fails the test with this {matchers.ContainElementMatcher=\u0026{0xc00040b4d0}) %!s(matchers.ContainElementMatcher=\u0026{0xc00040b560})]",
amazon-ebs.ubuntu-24.04: "duration": 191859,
amazon-ebs.ubuntu-24.04: "err": null,
amazon-ebs.ubuntu-24.04: "expected": [
amazon-ebs.ubuntu-24.04: ""
amazon-ebs.ubuntu-24.04: ],
amazon-ebs.ubuntu-24.04: "found": [
amazon-ebs.ubuntu-24.04: "[\"1.4.0-1.1\"]"
amazon-ebs.ubuntu-24.04: ],
amazon-ebs.ubuntu-24.04: "human": "Expected\n \u003c[]string | len:1, cap:1\u003e: [\"1.4.0-1.1\"]\nTo satisfy at least one of these matchers: [%!s(
amazon-ebs.ubuntu-24.04: "meta": null,
amazon-ebs.ubuntu-24.04: "property": "version",
amazon-ebs.ubuntu-24.04: "resource-id": "kubernetes-cni",
amazon-ebs.ubuntu-24.04: "resource-type": "Package",
amazon-ebs.ubuntu-24.04: "result": 1,
amazon-ebs.ubuntu-24.04: "successful": false,
amazon-ebs.ubuntu-24.04: "summary-line": "Package: kubernetes-cni: version:\nExpected\n \u003c[]string | len:1, cap:1\u003e: [\"1.4.0-1.1\"]\nTo satisfy at least one of these matchers: [%!s(matchers.ContainElementMatcher=\u0026{0xc00040b4d0}) %!s(**matchers.ContainElementMatcher=\u0026{0xc00040b560})]",
amazon-ebs.ubuntu-24.04: "test-type": 0,
amazon-ebs.ubuntu-24.04: "title": ""
amazon-ebs.ubuntu-24.04: }
Hmmm... maybe that var is no longer valid. As I say this was from months back now 😅
But, if the published AMIs are also not working for you I suspect something else is going on instead as I know people are successfully using those images.
Do you specify any kubeadm config or similar when creating your cluster?
I can say now, that i managed to pin down the problem to the tigera-operator
thanks for the help
I don't know exactly yet, but this is the error it, coreDNS etc.. give
Error from server: no preferred addresses found; known addresses: []
The problem was related to this
which required the CCM to run in host network, i believe i took a long way around to figure out.
thanks for help!
Hi all, I have built Flatcar OS image (latest version from main branch on github) and load image into OpenStack Glance, and then I created a server from this image with my custom user data (I show it as yaml file, but I surely converted it to ignition format when booted my server)
variant: flatcar
version: 1.0.0
passwd:
users:
- name: core
password_hash: "$y$j9T$qRgyCaQq.RDwlXNoe.4lS1$srnHt2JI76LZIEQrk1wgMYGvedk/21f0LTWnzH9Z3uB"
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQA..........................xxxxx
- name: stackops
password_hash: "$y$j9T$qRgyCaQq.RDwlXNoe.4lS1$srnHt2JI76LZIEQrk1wgMYGvedk/21f0LTWnzH9Z3uB"
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAA...............................xxxxxx
shell: /bin/bash
groups:
- root
- adm
- wheel
- sudo
- systemd-journal
- docker
storage:
files:
- path: /etc/ssh/sshd_config
overwrite: true
mode: 0600
contents:
inline: |
UsePrivilegeSeparation sandbox
Subsystem sftp internal-sftp
UseDNS no
PermitRootLogin no
AllowUsers core
AuthenticationMethods publickey
Hi, if you have access to Horizon dashboard you might be able to access the instance console and check that your SSH keys are correctly injected under .ssh/authorizedkeys
Another question: why do you set the passwordhash for core user if the goal is to SSH with public keys only? if you enter the password does it work?
EDIT: You can now use this SSHd configuration:
storage:
files:
- path: /etc/ssh/sshd_config.d/custom.conf
overwrite: true
mode: 0600
contents:
inline: |
# Do not allow root user
AllowUsers core
I'd like to run image-builder via a gitlab pipeline using a gitlab runner running via kubernetes, but image-builder is using a callback server ... Since I'm running via kubernetes, the local ip of the callback server can't be used, instead I have to expose the ip ... which I am able to do; but I'm not sure how to tell image-builder to use the exposed ip. How can I specify the ip to use?
never found a solution to this, instead I'm using Ubuntu cloud images, which are bootstrapped using cloud-init (no need to get the kickstart config from the Packer web server).
that's difficult, because he says Packer runs in a container so it gets a new (pod) IP every time, and it might not be accessible from outside the cluster (for the VM to reach it)
Just thinking out loud:
Create onfly service with selector pointing to the specific pod instance?
Another option is if you use a git based solution, you could host init config files as a server say using nginx.
those config files are built by Packer and are dynamicI am talking about files under say example
<a href='https://github.com/kubernetes-sigs/image-builder/tree/main/images/capi/packer/ova/linux'></a>you could use git based auto-update or something to make sure server is always uptodate
via the gitlab runner helm chart it is possible to ask for a loadbalancer ip, or expose the runner via an ingress FQDN (which gets added to dns automatically), if there were just a place to specify that ... but image-builder tries to use the ClusterIP of the pod which is unreachable.
I was starting to think about submitting a pull request, something like if 'such-and-such' env var existed, use that for the callback ip, otherwise detect the ip automatically ... but it was looking a lot harder than that. ... was hoping it wouldn't require a feature request; seems like this is going to become a more and more common use case
it can still bind to the clusterip, it's just the thing calling back that needs to call the alternative specified fqdn or ip
I'm not sure what is trying use the callback ip, when is the ip getting passed along which is being used to call back on? Its getting passed to the vm somehow to use after things are setup, if I wanted to submit a pull request, which git repo do I need to look at?
we just want to pass an alternative fqdn or ip that we specify via env var
I'm still trying to do this but having trouble figuring out where the code is that I'd have to change. Is the code different for every provider? So if I fixed it for proxmox it would only make proxmox builds work in kubernetes?
Is it in the provider, in imagebuilder, in packer, in the packer-sdk?
How could I influence things such that all providers might one day being able to run via kubernetes? There a kind of "standards group" I could suggest the idea to?
At this stage of the prj, I guess this is going to be a little difficult since lots of properties specifically wrt packer is provider specific(packer plugins are differnet and hence different set of props required). Plus bootstrap/cloud-init differs again based on provider-OS combination with not all providers supporting all OS flavours....
May be you can add agenda for this in doc and this then can be taken up during IB office hours
Being that I'm willing to put in the work, I kind of hate to let the idea die, but if it's just not a good idea I can let it go. I can just build images in a way that doesn't run in kubernetes. It's just that I prefer everything run 100% in k8s, making this one the first outlier for me.
hello everyone!
I use the openstack packer provider in creating my kubernetes images with image-builder. Is there a way for me to see the details of this image (services that go up together, etc.?)
Do you mean the output image on disk after image-builder has finished building it?
qemu: Setting up proxy adapter for Ansible....
==> qemu: Executing Ansible: ansible-playbook -e packerbuildname="qemu" -e packerbuildertype=qemu -e packerhttpaddr=10.0.2.2:8984 --ssh-extra-args '-o IdentitiesOnly=yes' --extra-vars containerdurl= containerdsha256=041fa3cfd4e6689d37516e4c7752741df0974e7985d97258c1009b20f25f33c7 pauseimage=registry.k8s.io/pause:3.9 containerdadditionalsettings= containerdcrisocket=/var/run/containerd/containerd.sock containerdversion=1.7.20 containerdwasmshimsurl=
qemu:
qemu: PLAY [all] **
==> qemu: ssh: handshake failed: EOF
Not sure I’ll make this. Currently in Copenhagen trying to find my Airbnb 😅
Nice, hopefully that's starting a vacation!
I put a couple things on the agenda just hopefully, but IDK if we actually have any updates. We just haven't met in a while and I wanted to touch base, but if we're too busy today I'll move them to the next slot.
I won't be around for a few weeks afet today - happy to push if needed but just an fYI there 🙂
I don't have any updates on those items though
Sounds good. We can keep it short if no one has any updates or other topics.
Nice, hopefully that's starting a vacation!Nope! Speaking at KCD Denmark tomorrow! 😁
Regarding the two topics on the agenda - there’s no update from my side. Tests are still in the same state I think and the Packer stuff isn’t moving and likely won’t.
Drew and I chatted about topics, but we don't really have any updates yet. I'll carry them over to the next meeting and maybe we've thought of something relevant by then. 🙂
I try to use image-builder to build ubuntu 22.04, someone knows how to use USG (Ubuntu Security Guide) to harden the image in image-builder?
I try to use "usg fix cislevel1server" in ansible task, however, when I use the generated osImage to provision EKSA bare metal cluster, I see such error:
"failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
"failed to load kubelet config file, path: /var/lib/kubelet/config.yaml, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory"
Hey folks! 👋 just created based on the conversation I had with @Marcus Noble at Kubecon!
The idea is to see if there’s appetite for this in the community and work on a proposal together!
I've also pinned this issue in the repo to try and get more visibility and feedback on it 🙂
Hi Community,
I've been using the image-builder for about 6-9 months now,
most of the time, it worked correctly, and if I encounter an issue, it was because I was missing something on the var_file.json
Nevertheless, since yesterday, there is something that is quite unique when it tries to build the image.
On the step Add the Kubernetes repo key is failing:
openstack: TASK [kubernetes : Add the Kubernetes repo key] *
openstack: fatal: [default]: FAILED! => {"after": ["D94AA3F0EFE21092", "871920D1991BC93C"], "before": ["D94AA3F0EFE21092", "871920D1991BC93C"], "changed": true, "fp": "234654DA9A296436", "id": "234654DA9A296436", "key_id": "234654DA9A296436", "msg": "apt-key did not return an error, but failed to add the key (check that the id is correct and *not a subkey)", "short_id": "9A296436"}
{
"source_image": "",
"networks": "",
"flavor": "gp1.small",
"floating_ip_network": "public",
"image_name": "ubuntu-2204-kube-v1.27.16",
"image_visibility": "public",
"image_disk_format": "raw",
"volume_size": "20",
"volume_type": "",
"ssh_username": "ubuntu",
"kubernetes_deb_version": "1.27.16-1.1",
"kubernetes_semver": "v1.27.16",
"kubernetes_series": "v1.27"
} I bumped the topics from last time to this one, but I'm not sure we have any updates. But I'm happy to get together even if it's brief.
Yeah no updates as far as I know but it'd be good to meet up as I suspect this'll be that last of the year anyway.
I am not able to see this meeting on the kubernetes calendar -
what's the best way to add this meeting to my calendar. Usually joining the mailing list works but that seems to be not working in this case
Yes, and we're not able to get it on the calendar. It's properly set up in the community repo, but the tooling simply fails to publish several sig-cluster-lifecycle events, and image-builder is one of them.
SCL volunteers have put a fair amount of time into trying to fix this or find a workaround and eventually gave up. Sorry, not a good answer I know.
but the context helps. Thanks! I'll setup up a reminder myself for now
Image-builder v0.1.40 is now available:
Thanks to all contributors! 🎉
Hello community.
I'm trying to build ubuntu-2404-qemu images using VM hosted in Proxmox.
but It couldn't.
first, It takes so long time to connect ssh and those message shown continually. (for about 20 minutes)
2024/12/02 18:45:19 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:19 [DEBUG] SSH handshake err: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey password], no supported methods remain
2024/12/02 18:45:19 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:19 [DEBUG] Detected authentication error. Increasing handshake attempts.
2024/12/02 18:45:26 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:26 [INFO] Attempting SSH connection to 127.0.0.1:2252...
2024/12/02 18:45:26 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:26 [DEBUG] reconnecting to TCP connection for SSH
2024/12/02 18:45:26 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:26 [DEBUG] handshaking with SSH
2024/12/02 18:45:28 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:28 [DEBUG] SSH handshake err: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey password], no supported methods remain
2024/12/02 18:45:28 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:28 [DEBUG] Detected authentication error. Increasing handshake attempts.
2024/12/02 18:45:35 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:35 [INFO] Attempting SSH connection to 127.0.0.1:2252...
2024/12/02 18:45:35 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:35 [DEBUG] reconnecting to TCP connection for SSH
2024/12/02 18:45:35 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:35 [DEBUG] handshaking with SSH
2024/12/02 18:45:38 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:38 [DEBUG] SSH handshake err: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey password], no supported methods remain
2024/12/02 18:45:38 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:38 [DEBUG] Detected authentication error. Increasing handshake attempts.
2024/12/02 18:45:45 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:45 [INFO] Attempting SSH connection to 127.0.0.1:2252...
2024/12/02 18:45:45 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:45 [DEBUG] reconnecting to TCP connection for SSH
2024/12/02 18:45:45 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:45 [DEBUG] handshaking with SSH
2024/12/02 18:45:48 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:48 [DEBUG] SSH handshake err: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey password], no supported methods remain
2024/12/02 18:45:48 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:48 [DEBUG] Detected authentication error. Increasing handshake attempts.
2024/12/02 18:45:55 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:55 [INFO] Attempting SSH connection to 127.0.0.1:2252...
2024/12/02 18:45:55 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:55 [DEBUG] reconnecting to TCP connection for SSH
2024/12/02 18:45:55 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:55 [DEBUG] handshaking with SSH
2024/12/02 18:45:58 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:58 [DEBUG] SSH handshake err: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey password], no supported methods remain
2024/12/02 18:45:58 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:45:58 [DEBUG] Detected authentication error. Increasing handshake attempts.
2024/12/02 18:46:05 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:46:05 [INFO] Attempting SSH connection to 127.0.0.1:2252...
2024/12/02 18:46:05 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:46:05 [DEBUG] reconnecting to TCP connection for SSH
2024/12/02 18:46:05 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:46:05 [DEBUG] handshaking with SSH
==> qemu: Provisioning with Ansible...
qemu: Setting up proxy adapter for Ansible....
2024/12/02 18:54:00 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/12/02 18:54:00 Creating inventory file for Ansible run...
2024/12/02 18:54:00 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/12/02 18:54:00 SSH proxy: serving on 127.0.0.1:40695
==> qemu: Executing Ansible: ansible-playbook -e packer_build_name="qemu" -e packer_builder_type=qemu -e packer_http_addr=10.0.2.2:8167 --ssh-extra-args '-o IdentitiesOnly=yes' --extra-vars containerd_url= containerd_sha256=041fa3cfd4e6689d37516e4c7752741df0974e7985d97258c1009b20f25f33c7 pause_image=registry.k8s.io/pause:3.9 containerd_additional_settings= containerd_cri_socket=/var/run/containerd/containerd.sock containerd_version=1.7.20 containerd_wasm_shims_url=- -linux-x86_64.tar.gz containerd_wasm_shims_version=v0.11.1 containerd_wasm_shims_sha256={"lunatic":"7054bc882db755ce5f3ded46d114bfd4e0a318e437fa18a2601295d20b616b32","slight":"a6ea87d965037933a7d9edb5e20cfc175265c8e1ca92a16535f1f3c3f376f5b0","spin":"dcffedb8e4d2f585a851b3de489fa1e8a0054ec0ad72cf111c623623919245d0","wws":"e917f90692d798d80873aa0f37990c7d652f2846129d64fecbfd41ffa77799b8"} containerd_wasm_shims_runtimes="" containerd_wasm_shims_runtime_versions="{"lunatic":"v1","slight":"v1","spin":"v2","wws":"v1"}" crictl_url= crictl_sha256= crictl_source_type=pkg custom_role_names="" firstboot_custom_roles_pre="" firstboot_custom_roles_post="" node_custom_roles_pre="" node_custom_roles_post="" disable_public_repos=false extra_debs="" extra_repos="" extra_rpms="" http_proxy= https_proxy= kubeadm_template=etc/kubeadm.yml kubernetes_apiserver_port=6443 kubernetes_cni_http_source= kubernetes_cni_http_checksum=sha256: kubernetes_goarch=amd64 kubernetes_http_source= kubernetes_container_registry=registry.k8s.io kubernetes_rpm_repo= kubernetes_rpm_gpg_key= kubernetes_rpm_gpg_check=True kubernetes_deb_repo= kubernetes_deb_gpg_key= kubernetes_cni_deb_version= kubernetes_cni_rpm_version= kubernetes_cni_semver=v1.2.0 kubernetes_cni_source_type=pkg kubernetes_semver=v1.30.5 kubernetes_source_type=pkg kubernetes_load_additional_imgs=false kubernetes_deb_version=1.30.5-1.1 kubernetes_rpm_version=1.30.5 no_proxy= pip_conf_file= python_path= redhat_epel_rpm= epel_rpm_gpg_key= reenable_public_repos=true remove_extra_repos=false systemd_prefix=/usr/lib/systemd sysusr_prefix=/usr sysusrlocal_prefix=/usr/local load_additional_components=false additional_registry_images=false additional_registry_images_list= ecr_credential_provider=false additional_url_images=false additional_url_images_list= additional_executables=false additional_executables_list= additional_executables_destination_path= additional_s3=false build_target=virt amazon_ssm_agent_rpm= enable_containerd_audit= kubernetes_enable_automatic_resource_sizing= debug_tools=false ubuntu_repo= ubuntu_security_repo= gpu_block_nouveau_loading= --extra-vars ansible_python_interpreter=/usr/bin/python3 --extra-vars --scp-extra-args "-O" -e ansible_ssh_private_key_file=/tmp/ansible-key1609096398 -i /tmp/packer-provisioner-ansible138304401 /home/systemadmin/image-builder/images/capi/ansible/node.yml
qemu:
qemu: PLAY [all] **
2024/12/02 18:54:01 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/12/02 18:54:01 SSH proxy: accepted connection
2024/12/02 18:54:01 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/12/02 18:54:01 authentication attempt from 127.0.0.1:39488 to 127.0.0.1:40695 as builder using none
2024/12/02 18:54:01 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/12/02 18:54:01 authentication attempt from 127.0.0.1:39488 to 127.0.0.1:40695 as builder using publickey
2024/12/02 18:54:01 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/12/02 18:54:01 new env request: LANG=C.UTF-8
2024/12/02 18:54:01 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/12/02 18:54:01 new exec request: /bin/sh -c '( umask 77 && mkdir -p "echo /tmp/.ansible"&& mkdir "echo /tmp/.ansible/ansible-tmp-1733165641.684557-2704-253976952207637" && echo ansible-tmp-1733165641.684557-2704-253976952207637="echo /tmp/.ansible/ansible-tmp-1733165641.684557-2704-253976952207637" ) && sleep 0'
2024/12/02 18:54:01 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:54:01 [DEBUG] Opening new ssh session
2024/12/02 18:54:01 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:54:01 [ERROR] ssh session open error: 'EOF', attempting reconnect
2024/12/02 18:54:01 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:54:01 [DEBUG] reconnecting to TCP connection for SSH
2024/12/02 18:54:01 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/02 18:54:01 [DEBUG] handshaking with SSH
2024/12/02 18:54:01 [INFO] 0 bytes written for 'stdin'
==> qemu: ssh: handshake failed: EOF
2024/12/02 18:54:01 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/12/02 18:54:01 [INFO] 0 bytes written for 'stdout'
2024/12/02 18:54:01 packer-plugin-ansible_v1.1.2_x5.0_linux_amd64 plugin: 2024/12/02 18:54:01 [INFO] 0 bytes written for 'stderr'
2024/12/02 18:54:01 [INFO] 0 bytes written for 'stderr'
2024/12/02 18:54:01 [INFO] 0 bytes written for 'stdout'
Read from remote host 172.16.223.13: Connection reset by peer
Connection to 172.16.223.13 closed.
client_loop: send disconnect: Broken pipe
If so, there should be a line something like this:
2024/12/08 01:27:09 packer-plugin-proxmox_v1.2.1_x5.0_linux_amd64 plugin: 2024/12/08 01:27:09 Found available port: 8395 on IP: 0.0.0.0Which is an HTTP server setup that the vm calls back to in order to load its configuration. This may be that the vm can't reach back to the HTTP server. To help understand you can test that HTTP server like this (the port may vary):
==> proxmox-iso.ubuntu-2204: Starting HTTP server on port 8395
$ curl localhost:8395
22.04/
24.04/
👋 Hello, team!
I am trying to add a cronjob inside a distroless container that is running the application. It is either giving error or not running correctly. How can get this working as I want to create a cronjob that runs a shell script & a python script.
This is the Dockerfile I'm using:
# Base image for building the application
FROM
docker.io/debian:12-slim AS build
# Set Python environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
# Create directory for Gunicorn logs
RUN mkdir -p /app/logs/gunicorn
# Install necessary dependencies and libraries
RUN apt-get update && <br> apt-get install --no-install-suggests --no-install-recommends --yes <br> python3-venv python3-dev default-libmysqlclient-dev build-essential <br> libmariadb-dev pkg-config wget curl gnupg2 unzip cron <br> && apt-get clean <br> && rm -rf /var/lib/apt/lists/**
# Set up Python virtual environment
RUN python3 -m venv /pypi/venv && <br> /pypi/venv/bin/pip install --upgrade pip setuptools wheel
# Copy crontab
COPY web/crontab /etc/cron.d/crontab
RUN chmod 0644 /etc/cron.d/crontab
RUN touch /var/log/cron.log
# Install Python dependencies
FROM build AS build-venv
COPY web/requirements.txt /requirements.txt
RUN /pypi/venv/bin/pip install --disable-pip-version-check -r /requirements.txt pymysql wfastcgi gunicorn gevent
# Final stage: Set up the runtime environment
FROM gcr.io/distroless/python3-debian11
# Copy necessary files from previous stages
COPY --from=build-venv /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu
COPY --from=build-venv /pypi/venv /pypi/venv
COPY --from=build-venv /app/logs /app/logs
COPY --from=build /etc/cron.d/crontab /etc/cron.d/crontab
COPY --from=build /var/log/cron.log /var/log/cron.log
# Set environment variables
ENV PYTHONPATH=web:$PYTHONPATH
# Copy application code
COPY . /app
WORKDIR /app
# Start Gunicorn
ENTRYPOINT ["cron", "&&", "/pypi/venv/bin/gunicorn", "web.APP.wsgi:application", "--bind", "0.0.0.0:8000", "--access-logfile", "/app/logs/gunicorn/access.log", "--error-logfile", "/app/logs/gunicorn/error.log", "--log-level", "info"]
#!/bin/sh
# Start gunicorn in background
/pypi/venv/bin/gunicorn web.APP.wsgi:application --bind 0.0.0.0:8000 --access-logfile /app/logs/gunicorn/access.log --error-logfile /app/logs/gunicorn/error.log --log-level info &
# Run periodic task
while true; do
python3 web/cron_script.py
sleep 300 # Run every 5 minutes
done
I think you might be asking in the wrong channel. This channel is for the https://github.com/kubernetes-sigs/image-builder project.
Hello image-builder maintainers, I have a fix for some RHEL image build issues which I observed in our CI, and it could potentially happen to anyone installing Ansible collection community.general version >= v10.0.0
@Marcus Noble / @mboersma PR submitted coresponding to issue. If either of you get time, can you please have a look at it.
I have marked it as draft... wanted to make a small edit to variablize(not sure if its actually a word) the value for maxsize ...
@mboersma I have marked it as ready for review and squashed the changeset.
Whats the procedure to cut tag/make a new release? What are the conditions we do that(one I am assuming is when CAPI release or K8s release happens)
The release process is described here:
There isn't a release cadence currently, it's more based on when maintainers think changes in main justify tagging it. We just cut v0.1.140 a couple days ago.
Trying to use 'ISO_FILE' w/ proxmox provider:
ISO_FILE="tower:iso/ubuntu-22.04.5-live-server-amd64.iso"Seeing error:
** one of iso_file, iso_url, or a combination of cd_files and cd_content must be specified for boot_iso
I added it to my proxmox.env file which I'm calling using this script:
$ cat go.sh
#!/bin/bash
docker run -it --rm --net=host --env-file proxmox.env <br> -v /tmp:/home/imagebuilder/images/capi/downloaded_iso_path <br> registry.k8s.io/scl-image-builder/cluster-node-image-builder-amd64:v0.1.40 build-proxmox-ubuntu-2204
Heya, I've been trying to build capi-images for a while now (for all of ubuntu-24.04, ubuntu-22.04, rocky9), but keep getting stuck at the same point -- both with qemu and with the remote-image builder on openstack. In all cases things die with a strange ansible error:
make build-qemu-ubuntu-2404 PACKER_LOG=1I'd appreciate any pointers on how to fix this.
....
2024/12/13 13:43:40 packer-plugin-ansible_v1.1.1_x5.0_linux_amd64 plugin: 2024/12/13 13:43:40 [INFO] 0 bytes written for 'stdout'
2024/12/13 13:43:40 packer-plugin-ansible_v1.1.1_x5.0_linux_amd64 plugin: 2024/12/13 13:43:40 [INFO] 0 bytes written for 'stderr'
2024/12/13 13:43:40 packer-plugin-ansible_v1.1.1_x5.0_linux_amd64 plugin: 2024/12/13 13:43:40 [INFO] RPC client: Communicator ended with: 0
2024/12/13 13:43:40 packer-plugin-ansible_v1.1.1_x5.0_linux_amd64 plugin: 2024/12/13 13:43:40 [INFO] 0 bytes written for 'stdin'
qemu:
qemu: TASK [Gathering Facts] *
qemu: fatal: [default]: FAILED! => {"msg": "failed to transfer file to /root/.ansible/tmp/ansible-local-1793709qosm0_m_/tmp5fp2y_ef /tmp/.ansible/ansible-tmp-1734093819.7338624-1793720-280401030368575/AnsiballZ_setup.py:\n\n"}
qemu:
qemu: PLAY RECAP *
qemu: default : ok=1 changed=0 unreachable=0 failed=1 skipped=1 rescued=0 ignored=0
qemu:
2024/12/13 13:43:40 packer-plugin-ansible_v1.1.1_x5.0_linux_amd64 plugin: 2024/12/13 13:43:40 shutting down the SSH proxy
2024/12/13 13:43:40 [INFO] (telemetry) ending ansible
==> qemu: Provisioning step had errors: Running the cleanup provisioner, if present...
2024/12/13 13:43:40 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/13 13:43:40 failed to unlock port lockfile: close tcp 127.0.0.1:5998: use of closed network connection
2024/12/13 13:43:40 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/13 13:43:40 failed to unlock port lockfile: close tcp 127.0.0.1:3020: use of closed network connection
2024/12/13 13:43:40 [INFO] (telemetry) ending qemu
==> Wait completed after 11 minutes 49 seconds
2024/12/13 13:43:40 machine readable: error-count []string{"1"}
==> Some builds didn't complete successfully and had errors:
2024/12/13 13:43:40 machine readable: qemu,error []string{"Error executing Ansible: Non-zero exit status: exit status 2"}
==> Builds finished but no artifacts were created.
2024/12/13 13:43:40 [INFO] (telemetry) Finalizing.
==> qemu: Deleting output directory...
Build 'qemu' errored after 11 minutes 49 seconds: Error executing Ansible: Non-zero exit status: exit status 2
==> Wait completed after 11 minutes 49 seconds
==> Some builds didn't complete successfully and had errors:
--> qemu: Error executing Ansible: Non-zero exit status: exit status 2
are there any other errors that stand out? Unfortunately the failed to unlock port lockfile: close tcp 127.0.0.1:5998: use of closed network connection is usually just a red herring as it's something else that's caused a failure and then this gets output as a result.
I can confirm openstack builds are working with the latest release as I ran one yesterday afternoon that was successful. QEMU should be too but I've not personally tested.
I don't see anything else that might cause issues. However, if I run with FOREGROUND=1 I get the following:
make build-qemu-ubuntu-2404 PACKER_LOG=1 FOREGROUND=1
....
==> qemu: Starting VM, booting from CD-ROM
2024/12/13 14:41:54 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/13 14:41:54 Qemu version: 8.2.0
2024/12/13 14:41:54 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/13 14:41:54 Qemu Builder has no floppy files, not attaching a floppy.
2024/12/13 14:41:54 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/13 14:41:54 Executing /root/.local/bin/qemu-system-x86_64: []string{"-display", "gtk", "-vnc", "127.0.0.1:63", "-drive", "if=none,file=output/ubuntu-2404-kube-v1.30.5/ubuntu-2404-kube-v1.30.5,id=drive0,cache=writeback,discard=unmap,format=qcow2", "-drive", "file=/root/.cache/packer/85d1bf86e5e0ecdd6e91515a63cc10bdab146dca.iso,media=cdrom", "-machine", "type=pc,accel=kvm", "-smp", "1", "-cpu", "host", "-device", "virtio-scsi-pci,id=scsi0", "-device", "scsi-hd,bus=scsi0.0,drive=drive0", "-device", "virtio-net,netdev=user.0", "-boot", "once=d", "-netdev", "user,id=user.0,hostfwd=tcp::4092_:22", "-m", "2048M", "-name", "ubuntu-2404-kube-v1.30.5"}
2024/12/13 14:41:54 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/13 14:41:54 Started Qemu. Pid: 1795078
2024/12/13 14:41:54 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/13 14:41:54 Qemu stderr: qemu-system-x86_64: -display gtk: Parameter 'type' does not accept value 'gtk'
==> qemu: Error launching VM: Qemu failed to start. Please run with PACKER_LOG=1 to get more info.
2024/12/13 14:41:54 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/13 14:41:54 failed to unlock port lockfile: close tcp 127.0.0.1:5963: use of closed network connection
2024/12/13 14:41:54 packer-plugin-qemu_v1.1.0_x5.0_linux_amd64 plugin: 2024/12/13 14:41:54 failed to unlock port lockfile: close tcp 127.0.0.1:4092: use of closed network connection
==> qemu: Deleting output directory...
Build 'qemu' errored after 15 seconds 720 milliseconds: Build was halted.
==> Wait completed after 15 seconds 720 milliseconds
==> Some builds didn't complete successfully and had errors:
--> qemu: Build was halted.
==> Builds finished but no artifacts were created.
2024/12/13 14:41:54 [INFO] (telemetry) ending qemu
==> Wait completed after 15 seconds 720 milliseconds
2024/12/13 14:41:54 machine readable: error-count []string{"1"}
==> Some builds didn't complete successfully and had errors:
2024/12/13 14:41:54 machine readable: qemu,error []string{"Build was halted."}
==> Builds finished but no artifacts were created.
2024/12/13 14:41:54 [INFO] (telemetry) Finalizing.
2024/12/13 14:41:55 waiting for all plugin processes to complete...
2024/12/13 14:41:55 /usr/bin/packer: plugin process exited
2024/12/13 14:41:55 /root/.packer.d/plugins/github.com/YaleUniversity/goss/packer-plugin-goss_v3.2.13_x5.0_linux_amd64: plugin process exited
2024/12/13 14:41:55 /root/.packer.d/plugins/github.com/hashicorp/ansible/packer-plugin-ansible_v1.1.1_x5.0_linux_amd64: plugin process exited
2024/12/13 14:41:55 /root/.packer.d/plugins/github.com/hashicorp/qemu/packer-plugin-qemu_v1.1.0_x5.0_linux_amd64: plugin process exited
2024/12/13 14:41:55 /usr/bin/packer: plugin process exited
2024/12/13 14:41:55 /root/.packer.d/plugins/github.com/hashicorp/ansible/packer-plugin-ansible_v1.1.1_x5.0_linux_amd64: plugin process exited
2024/12/13 14:41:55 /usr/bin/packer: plugin process exited
2024/12/13 14:41:55 /usr/bin/packer: plugin process exited
make: ** [Makefile:543: build-qemu-ubuntu-2404] Error 1
Don't know if that qemu-system-x86_64: -display gtk: Parameter 'type' does not accept value 'gtk' is causing things to die also without FOREGOUND=1 ?
For the openstack remote build I also get
PACKER_VAR_FILES=openstack_vars.json make build-openstack-ubuntu-2204No other errors besides this one.
....
openstack:
openstack: PLAY [all] *
openstack:
openstack: TASK [Gathering Facts]
openstack: fatal: [default]: FAILED! => {"msg": "failed to transfer file to /root/.ansible/tmp/ansible-local-1726741dszour5u/tmpj3mgwet2 /tmp/.ansible/ansible-tmp-1733848994.5385716-1726748-172419909510439/AnsiballZ_setup.py:\n\n"}
openstack:
openstack: PLAY RECAP *
openstack: default : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
openstack:
==> openstack: Provisioning step had errors: Running the cleanup provisioner, if present...
ok cool. Well I'm running the qemu one now and it's working fine for me.
Sorry for being slow in replying -- Christmas activities 🎅 🙂
I now ran both commands:
qemu-img create -f qcow2 output/ubuntu-2404-kube-v1.30.5/ubuntu-2404-kube-v1.30.5 20480Mand
Formatting 'output/ubuntu-2404-kube-v1.30.5/ubuntu-2404-kube-v1.30.5', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=21474836480 lazy_refcounts=off refcount_bits=16
root/.local/bin/qemu-system-x86_64 -smp 1 -machine type=pc,accel=kvm -netdev user,id=user.0,hostfwd=tcp::3020:22 -drive if=none,file=output/ubuntu-2404-kube-v1.30.5/ubuntu-2404-kube-v1.30.5,id=drive0,cache=writeback,discard=unmap,format=qcow2 -drive file=/root/.cache/packer/85d1bf86e5e0ecdd6e91515a63cc10bdab146dca.iso,media=cdrom -boot once=d -m 2048M -vnc 127.0.0.1:98 -cpu host -device virtio-scsi-pci,id=scsi0 -device scsi-hd,bus=scsi0.0,drive=drive0 -device virtio-net,netdev=user.0 -name ubuntu-2404-kube-v1.30.5Now it just sits there. I guess that's a good thing?
(diskImageBuilder-venv) [root@cirrus-deploy capi]# /root/.local/bin/qemu-system-x86_64 -smp 1 -machine type=pc,accel=kvm -netdev user,id=user.0,hostfwd=tcp::3020:22 -drive if=none,file=output/ubuntu-2404-kube-v1.30.5/ubuntu-2404-kube-v1.30.5,id=drive0,cache=writeback,discard=unmap,format=qcow2 -drive file=/root/.cache/packer/85d1bf86e5e0ecdd6e91515a63cc10bdab146dca.iso,media=cdrom -boot once=d -m 2048M -vnc 127.0.0.1:98 -cpu host -device virtio-scsi-pci,id=scsi0 -device scsi-hd,bus=scsi0.0,drive=drive0 -device virtio-net,netdev=user.0 -name ubuntu-2404-kube-v1.30.5
qemu-system-x86_64: warning: Machine type 'pc-i440fx-rhel7.6.0' is deprecated: machine types for previous major releases are deprecated
Gotta run for now. I'll pick this up again on Monday. Thanks a lot for looking into this!!!
No problem. Yeah once you've run that, it means it's sitting there and the instance should be running. You can use a VNC client to try and connect to ensure it's working. This rules out any issue with the qemu binary you're using anyway if it is working and you can connect.
Have a good weekend and enjoy the festivities!
I have a "soft" conflict with office hours this moring, but I don't see anyting on the agenda, so I'm assuming we will skip. Hopefully I'm not wrong, please follow up here in Slack if there are questions.
Yes I've got nothing today so I'm happy to skip this one. Not feeling too great either so more rest time is welcomed!
Hi. I am having issues building a qemu Ubuntu 24.04 image (22.04 works ok). The first task for ansible-playbook node.yaml fails to connect SSH. It looks like this issue; . Does anyone have suggestions how to debug further?
There's nothing currently on the agenda. Let me know if you have anything you'd like to discuss, otherwise we'll skip until next time (13 January 2025).
Hello, we are trying to build a kubernetes VM image from an existing ova template on vSphere, we use the image builder project with the vspher-clone packer builder. At the cloning stage we get the following error knowing that the template folder exist and have in its full path a whitespace. We suspect that the builder does not correctly parse the fullpath ? can anyone help us on this matter? Thank you
Hi,
I'm not a user of the vSphere side of things so I'm providing complete guesswork from my side.
It does seem like it's expected that no whitespace would be in the path though. The only things I can recommend is to either remove the whitespace from the path if possible or try supplying an escaped value for the space and see if that works.
If not, maybe someone who uses vSphere in anger can supply another option.
it seems that the vsphere plugin handles the path join, so we should have only provided the folder name directly not the fullpath 😅.
Quick question, why there is no preseed-efi.cfg for ubuntu 22.04?
We can see such for ubuntu 20.04 which use base file instead:
./packer/raw/linux/ubuntu/http/base/preseed-efi.cfg
./packer/raw/linux/ubuntu/http/20.04/preseed-efi.cfg
The gz file, when decompressed, yields an image of size 6442450944 bytes which is exactly equal to 6144 MB , the disk size specified here in the raw packer.json. So I think by reducing the size here, you should be able to reduce the final image size. Although I must say I'm not sure about the side effects of doing so.
Thanks, I can try, however, I tried to increase that value to 7G for some reason(s) I don't recall.
From a different way, are there packages can be excluded/removed some how to save image size to under 1GB?
preseed was deprecated a long time ago, not sure if it even works on newer Ubuntu versions
Hi everyone, I'm facing a weird issue and I'm running out of ideas; I'm hoping somebody from this channel would know 😛
I'm building Ubuntu 22.04 images on Nutanix using image-builder, and for some reasons I would like to use a specific "release" of the Ubuntu 22.04 cloud image -- with a specific kernel version/ABI. I've tried setting the image_url to an older release and deployed the resulting template on a cluster, but the kernel version reported doesn't match the one from the Ubuntu release. I've tested with multiple release, they all seem to have kernel version 5.15.0-130-generic which is probably the latest for 22.04.
I'm trying to figure out if the problem comes from the upstream image(s), from the infrastructure building the image or image-builder itself (is it maybe updating the kernel during the build?). Any hint or ideas would be welcome.
there is a apt dist-upgrade in the image builder process, i imagine this what replace your kernel version
☝️ anything to discuss today? Agenda currently empty
Damn, sorry just seen this. I’m not at my laptop now
No worries @Marcus Noble, we had a good discussion and we'll summarize in the document.
Hi all, I was wondering is there someone here that uses raw target to build images? especially Ubuntu and/or Flatcar images
Hello, yes we do use image-builder to build Ubuntu 20.04/22.04 and RHEL 8/9 Raw images. Image-builder currently doesn't have Ubuntu 22.04 raw image support so we patch it to include that functionality. The Ubuntu 22.04 subiquity autoinstall files for qemu just worked for raw builds as well and we validate these images in our e2e tests as well
^^ I can test raw (Ubuntu) target manually, if some one has the ability to test other changes would be nice, or we can split PR maybe?
Hello all, I am using the build-node-ova-vsphere-clone-ubuntu-2204 to build a kubernetes v1.30.7 vSphere ova from the base template cluster-api-provider-vsphere 1.30.0. I am trying to build it without exposoing an HTTP server to serve the meta-data and user-data for the cloud-init, instead I tweeked the packer-node configuration for the vsphere-clone builder to use the cdfiles and cdlabel in order to pass them to the VM in an An iso CD. Unfortunatly, the build process get stuck on the Waiting fot ssh to be available step with the following msg, I suspect that the cloud-init does not take into consideration the user-data file? Any input on this issue would be appriciated.
I have added the following to the user-data config apt:
preservesourceslist: false
primary:
- arches: [default]
uri:
I’d take a look to the vm packer creates maybe you get some clue out of that.
when I connect to the VM using the vSphere web console, I get the login prompt but no creds work for me, I have tried to connect with the builder user and its corresponding password (that I have changed in the set-ssh-paasword script so it does not generate one randomly)
@zakaria Depending on the commit you are running from, if older, builder/builder should work if I am not mistaken. In case of newer commit, check the console(script) to get the password(password get generated randomly during start and gets logged).
Also if you are able to see that the VM has booted correctly with login prompt, check if the IP assigned to the VM and the one packer is waiting on are the same. If different, increase the ipsettletimeout to say 20m or something depending upon the the time taken to boot up of the VM. Sometimes when bootup takes time (usually due to slower network - primarily to connect to Ubuntu servers to do update), in such cases DHCP re-assigns a new IP and packer will not know of the new IP and waits on the older IP. ipsettletimeout will cause packer to wait for the mentioned time before trying to get the IP of the VM
@Sriraman Srinivasan I am using a static IP, with the linux customize options ( It seems that image builder project does not support it by default, I had to add the customize config to the packer-node.json file), and I set the ssh_host to the same ip address.
I am using the latest image builder version v0.1.40 that uses the packer version v1.9.5
I have hardcoded the $ENCRYPTEDSSHPASSWORD to have a simple password in the set-ssh-password script, As I understand in the user-data config, it sources the builder user password from this variable.
I suspect that the cloud-init does not take into consideration the user-data I have configured
A humble request:
Though we all use image-builder to create the images for our kubernetes clusters we are currently unable to use it within kubernetes pipeline solutions. So, if you use an on-prem solution such a gitlab or jenkins, running in kubernetes, you will not be able to run image-builder.
This is because the hashicorp packer project has a limitation which breaks via kubernetes. I've created a fix, which requires three tiny changes in three git repos. I humbly request, if you would like to run image-builder within a kubernetes environment that you give my pull request over there a thumbs up. I'm not sure the folks reviewing my pull request have enough kubernetes experience to really understand how freeing using their tool via kubernetes would be.
Thank you for your consideration.
Additionally, I began working on a strategy to test everything image-builder related, but my solutions are always kubernetes based meaning I can't pursue those goals without being able to run things within a kubernetes environment. ()
Why are you unable to run in Kubernetes exactly? I run image-builder in Kubernetes with Tekton pipelines successfully for CAPA, CAPZ and CAPV
I'm also running image-builder within Kubernetes, with CAPX and CAPV.
For CAPV I have to apply a patch to the Makefile because we're using Ubuntu cloud images (so vApps) but overall its OK.
because when the vm is complete it sends an http message back to packer, the ip it uses is the ip of the pod, the clusterip, and so the vm fails to signal that its finished
if you've already got a solution for that i'm all ears, i've put a ton of effort into this and would love to know there's already a method to make it work
If i'm not mistaken the HTTP call is used to get the "kickstart" (not sure about the term here). So if you're using cloud-init, this is not needed as all the information needed to bootstrap the VM is already known (via userdata).
i'm essentially running this same command but wanting to do it via kubernetes rather than on my vm with docker:
docker run -it --rm --net=host --env-file proxmox.env <br> -v /tmp:/home/imagebuilder/images/capi/downloaded_iso_path <br> registry.k8s.io/scl-image-builder/cluster-node-image-builder-amd64:v0.1.38 build-proxmox-ubuntu-2204()
🤔 Maybe this is something Proxmox-specific then?
I have tried to solve the issue in a way that would be easy for any provider to implement. Logically, each provider could implement their own unique non-standard work around, which I suspect the more popular providers have probably done.
Do you know what exactly in the VM performs the callback? I didn't think Packer actually ran on the VMs when building but I could be wrong about that.
there's a place in the proxmox provider where it binds to 0.0.0.0 the http server the vm will talk with, it then runs a function that tries to get the ip address of the system, then provides the ip address it finds to the vm; my solution adds a new variable to the packer sdk, then the proxmox provider checks if that variable exists, if it does it uses that as the ip address instead of running the function to get the ip of the system id did the 0.0.0.0 bind with; there is another variable which looks it can be used for that purpose but its slightly different in that it offers an alterative bind address when it exists instead of 0.0.0.0
Ah ok, so this is a Proxmox feature. That explains why no one else has reported it. We have very few Proxmox users.
It's this functionality, right?
I don't think its necessarily proxmox specific; Its been awhile, but when I was deep diving this a few months ago I think what I found was some sample code that people could use when they wanted to put together a provider, and that sample code has this problem (feature?) in it. So I'd say its more sample-code related, and probably in multiple providers, than specifically proxmox.
Sorry, let me say "proxmox-specific in image-builder"
though at the moment i'd have to dive in again, i'm not sure if it was clusterapi sample code or image builder sample code
i'm not sure this is specific to Proxmox, is it? when using ubuntu 22 live server ISO, it expects the boot command, which usually refers to an HTTP server (which is hosted by the Packer host)
one sec, reviewing my own pull requests which point out where it is in the code
Ok I guess what I really want to know then is why this isn't impacting the other providers in the same way then. If we can figure that out then maybe we can solve it for Proxmos
for CAPV I end up with the same issue if I'm using the Ubuntu live server ISO
because the packer pod has a non routable (pod) IP, which is set by Packer in the boot command - but the VM cannot reach it
I suspect if we ran image-builder via kubernetes with every provider and each of the images many would build the image, timeout after like 20 minutes, and fail because the vm couldn't report in.
ok - so the reason I've never hit this in our environment then is we build Flatcar not Ubuntu which I guess doesn't have this behaviour because of ignition.
But how are we not having lots of people complaining about this? 😕 Surely this would be a blocker for plenty of people?
I remember seeing a couple of issues from people failing to build Ubuntu on CAPV
maybe when it doesn't work people just give up and switch to another solution, or they aren't running pipelines in k8s cause they just haven't got to that point in their experience with kubernetes, or they just use aws and maybe that one works (i haven't tried), or when it doesn't work they just give up and run docker via a vm and hope someone else fixes the problem some day
I'm guessing then that ssh doesn't become available because the VM didn't initialise and setup ssh server because it couldn't reach back to packer
that looks like it, "not planned", yet my pull request would solve that one ... so i should link it to my pull request
"not planned" is because it went stale as no-one could solve it or offer more info
which is understandable, i worked on it for weeks before figuring it out
so your PR would allow to override the HTTP server IP right? how would you determine the IP to use then?
it took a long time cause i had to learn code in three repos to solve it, and well open source so didn't have a ton of time ... but i became a little obsessed i'll admit
in my case i use gitlab which uses a gitlab-runner in kubernetes and it makes the ip available as an env var
but what IP would you set? cause the runner has a dynamic IP, every time it runs it gets a different IP
the helm chart has a setting 'use loadbalancer' or just clusterip, so i set that to loadbalancer
Hold on, before we go too far with this - we have tests in place on image-builder that successfully build ubuntu-2204 on Azure. Maybe we should have a look at what that provider does differently.
sorry, i am supposed to be at work ... so i have to come back to this when my day ends, i can look into that log, but i can look into their custom solution after work, and can work on things this weekend
Oh I'm not saying you need to solve it, or we need to solve it right now. I'm just saying its something to look at as an alternative possible solution 🙂
i'm definitely motivated to get this implemented, and my preference is to solve it for all providers if possible, rather than rewrite proxmox and let the issue remain elsewhere
Sure, but if we can solve it without needing a load balancer then that would be my personal preference 🙂
since you both understand the issue, if you could help to communicate it over in that packer-sdk pull request, that'd be helpful to help out the person making the decision to include it or not
@Marcus Noble i suppose there's nothing stopping us from putting in the kubernetes specific url, in my case something like gitlab-runner.gitlab.svc so we can use the clusterip
but we'd still need the implementation i've put together (as far as i can tell)
maybe if you could both just add a note over there that the issue does exist, and that the pull requests could solve it, though i understand folks might decide as a group there's a better way
What I mean is the Azure provider manages without the need of calling back to a http server it seems. I'd like to understand how it does that.
Ah! I think Azure uses Azure Resource Manager which gives it more capabilities
My concern mainly is even if we get that PR in, how do we handle this in environments where the cluster isn't routable from the cloud (that's the case for me). Also, what security implications are there in opening up that endpoint to the web? (As I don't understand this enough)
i'd use an ingress for sure, but it'd all just be internal in the end
Yeah, that'd work in your specific case but I'm trying to think about the other environments this may run. And also thinking about the recent security incidents we've had and make sure we don't introduce anything more. 🙂
dang it, i want to work on this, but have to go to actual job ... afk for awhile, will be in and out
Just to be clear - I'm not trying to block anything here, I just want to figure out all the options 🙂
so everyone isn't implementing something unique, unique solutions make fore security vulnerabilities
Yeah, but I suspect that is totally what's happening 😞
Azure and GCE seem to use pre-existing images (kind of AMIs) so they probably don't need the boot command
Ah! Good point! I guess that'll be the same for AWS too then.
I need to head off also as it's end of my day now. I'm going to try and think through this over the weekend. If we go down this route I think we need some automated way to expose the endpoint securely but I'm not sure we will be able to as there's nothing that says image-builder is running in a cluster. 🤔
at the end its just a way to override the IP address that is announced to the VM, its not changing the way it is actually running
if there's a security risk linked to this, then it is already present as packer is already running on that image builder k8s pod
Yeah, I’m just trying to figure out how this will work in image-builder
I've just read all of this thinking "but I don't have this problem in OpenStack either" - then I read the last 4 comments and the penny dropped. The problem here is systems that don't use the boot command are generally fine. Anything that does evidently seems to be having this issue.
That's a fascinating bug though and whilst it appears to be an edge case where it errors out, it's not an unlikely scenario that would cause it IE someone wanting to actually build something in K8S which relies on a boot command.
Not sure what the solution is yet, just wrapping my head around it all myself and thought I'd chime in with my 2 cents!
For vsphere/capv afaik it also works in a mode where the cloud init data is provided via an iso instead of relying on the http callback.
Especially remembering this part in that pr https://github.com/kubernetes-sigs/image-builder/pull/1459/files#diff-92590d094272c3e33208dbb6a15aff2a1f1e9d866b7411a2c38271e57f5b2728R78
Because upstream ci (when it still existed) did run packer in a pod inside the prow cluster. There was no way to call back from the VM to the pod where packer was running.
Any idea how that is used @chrischdi? Looks like we install it into the container image so maybe we can use that for Proxmox then? I dunno what it actually does though.
One question is: does the proxmox plugin allow to mount a second iso to get the cloud-init data from.
and allow to upload the built iso (so it can be mounted). But I never used proxmox.
You can upload ISO to proxmox, that bit I know. I don't know about the first. Can you show me how it's used in vsphere provider and I'll see if I can figure it out
ah yes! It's about half way down the page (annoyingly no anchor to link to)
So it might be possible to add:
"cd_content_location": "./packer/proxmox/linux/{{user distro_name}}/http/{{user distro_version}}/**",
"cd_label": "cidata",to the proxmox ubuntu values?
IMHO the most robust way for this stuff as the network back to packer is not needed.
Until this thread I didn't even know the http call was how it worked 🙈
If no one has come up with a better way, maybe consider the solution I've put together and comment on the pull requests / github issues.
Have you tried the cd approach? That would make it inline with others and not require HTTP connectivity.
I'm not sure what you mean by cd, so no probably. I'll scroll up in this chat and look for 'cd'.
Still, I mean ... even if there is a workaround, why not solve it so a work-around isn't required.
It isn’t a workaround. It’s the approach used elsewhere in the project.
I feel resistance to the idea I've proposed, I don't want to go against the flow, could you help me to see why the idea I've suggested isn't a good idea?
I'm just curious, genuinely interested in your insight before I give up ... I mean, the hard part has already been done ... I've already implemented a solution.
I’m not suggesting that. I’m just asking to try the approach that we already have used with other providers and see if that solves the problem. That way we don’t need any external changes.
k, sorry, what do you mean by cd? that translates as 'continous delivery' in my brain
Yeah that. Sorry I’m on my phone. I should have got you a link.
no worries, working on several other things at the moment ... appreciate the response
☝️ Anyone want to use the office hours to discuss the above thread? (Or is the thread good enough for now?) Any other topics as the agenda is currently empty?
I'm easy on this - I think the thread is ok for the moment unless we've some immediate action we want to take?
I do have to leave to pick my daughter up at 4:45 though so I'd have to do a quick one for me 😄
I don't have any agenda items in particular, but happy to talk. I just now caught up with the megathread.
tbf, It has been a while so at the very least we should have a catch up 🙂
I have a question in regards to PR, so it would be nice to have some short meetup
I'm having some internet issues so I might not actually be able to join. Go ahead without me if I'm not there
Unfortunately I won't be able to make this one. We've a new starter today and I'm in meetings until around 5:30 UK time.
I'm also not going to make it today but the agenda is looking empty so I think it's ok to skip.
Hi Team,
I am trying to build an image using below command
PACKER_LOG=1 PACKER_FLAGS="--var 'kubernetes_rpm_version=1.30.4-0' --var 'kubernetes_semver=v1.30.4' --var 'kubernetes_series=v1.30' --var 'kubernetes_deb_version=1.30.4-00'" make build-qemu-ubuntu-2204but getting ssh handshake error
Attempting SSH connection to 127.0.0.1:2877...
reconnecting to TCP connection for SSH
handshaking with SSH
SSH handshake err: ssh: handshake failed: read tcp 127.0.0.1:33368->127.0.0.1:2877: read: connection reset by peer
PACKER_LOG=1 PACKER_FLAGS="--var 'kubernetes_semver=v1.30.4'" make build-qemu-ubuntu-2204Can someone please help?
I think you might be hitting this issue (which closed without a resolution by the looks of it 😞 )
Does anyone have a process for adding custom tags to AWS AMIs when building them with image-builder?
I had totally forgot about this 🤦♂️ I'm going to need to skip. Got too much that still needs to get done.
No worries Marcus! I can be there but the agenda is currently empty, so unless someone speaks up soon I think we will skip until next time.
Hello @Marcus Noble @mboersma @Drew Hudson-Viles just curious, what is the tentative date for the next IB release v0.1.41?
The project doesn't have a release schedule, we've been doing releases when "enough" new features or bug fixes have landed. We may be overdue for one.
Ah! I knew there was something I wanted to mention 🤦♂️ Yeah. I think we're overdue one, I noticed we have a few unreleased merges the other day.
There's a PR in the queue right now that may merge, but after that I can do a release.
Especially excited for this one because a couple of IB patches we're maintaining on EKS-anywhere side have been upstreamed (thanks to @s3rj1k 🎉)
How can we know the password of the default user "ubuntu". Or is there any option to configure it as I'm not able to find anything related to this. Found in a discussion that a random UUID is taken for password during image build process.
It depends on the provider you're using.
For example, the QEMU provider has it dynamically set via a script which generates a packer.json from this template:
Other builders will allow you to pass it in as a var - you'll have to look at which provider you are using and the builder it invokes to pass the correct value through.
The UUID password is for the "builder" user that should only be used during the initial image creation. I thought you're asking for an ssh user within the generated image at the end, yes?
aaah yeah sorry, I may have misread that Marcus - good catch about the resulting image 😄
I don't thing, by default, we generate any ssh user in the resulting image. (I could be wrong, not sure about all providers and OS's)
To my knowledge no we do not. The method I mentioned is about setting one during the build process.
Generally speaking you would pass in things like credentials/user creation or public keys for your ssh keypair via cloud-init .
Hey everyone,
I recently opened a PR to add support for a new provider, Canonical MaaS.
What’s the process for adding a new provider? Should this be discussed with someone in particular, or would it be a good topic for the next office hours meeting?
Here’s the PR:
Hey Victor, I'm sorry I meant to respond to your PR and just lost track of time with everything going on.
I'm having a chat with the other maintainers about this (and related) as we're at a tipping point where we're not sure if we can commit to managing any more providers or operating systems. We think we might need to come up with some sort of minimum set of requirements for adding new ones but not sure yet what that might be. Ideally, any new providers would also come with tests but that's really only viable if the providers themself are willing to donate infrastructure to run tests on (do you know if this is something Canonical is likely to be interested in?)
I think it makes sense to talk about this in the office hours.
/cc @mboersma @jsturtevant @Drew Hudson-Viles
Hello Marcus,
I'm not sure if Canonical is interested in this. By the way, I don't work at Canonical but at another company that is integrating with MaaS.
Regarding donating infrastructure to run MaaS, I can check internally at my company. It might be feasible, but I need to discuss it with many people to make it happen.
IMHO, I think this would be a very interesting and important step for the CAPI project to support bare metal directly. As you know, with the rise of AI and related technologies, the use of bare metal will increase considerably, and MaaS makes it much easier to deploy a Kubernetes cluster.
Hello @Marcus Noble
Sorry to bother you with this, but I'd like to go over some points we discussed in the last office hours.
I left a comment on the PR , asking if my understanding of our discussion was correct.
I'm not sure if I fully understood which README (or if this should perhaps go into the book?) and what its content should be. If you could help with some topics, I can continue from there.
I'm asking because I want to speed up the process of getting this provider merged into master as much as possible, so we can move forward with our project more smoothly.
And regarding the owners file, who is responsible for updating it? Should I do it? How does this process work?
I’m currently travelling, @Drew Hudson-Viles or @mboersma are you available to help out?
Oh, Marcus, I'm really sorry to bother you with this!! Enjoy your trip!
We're just launching a (quiet) go live today so might have limited time but I will try and find some if I can and if Matt doesn't get in there before me!
Got it! no worries if it’s not possible today. Good luck with the go-live!
Does image-builder support full disk encryption (boot disk) for ubuntu 22.04 now?
Why chronyd, when provider has matching NTP set?
I compare the result of make build-hcloud-ubuntu-2404 with a vanilla ubuntu24.04 created in hcloud.
The vanilla image has this setting:
/etc/systemd/timesyncd.conf.d/hetzner.conf:NTP=ntp.hetzner.com
thank you for the reply. I see that there is a lot of chrony config in image-builder. I will see if I can disable it.
no background on ntp in image-builder, from personal experience in the past: chronyd behaves way better and I/we had experience in the past when using timesyncd
So the time switched here, but not yet in the UK? So is ~10 minutes from now the right time, or am I just making US-based assumptions?
The UK switched ages ago. We're back on UTC 😄
My notification popped up so I think you're right
Maybe I should delete this and let you recreate? Then it should (hopefully) maintain the right time with the US?
Ok, I recreated it for 8:30 am my time. Had to fiddle around to get it to start two weeks from now.
set up a reminder “Image-Builder office hours start in 1 hour. Agenda: https://docs.google.com/document/d/1YIOD0Nnid_0h6rKlDxcbfJaoIRNO6mQd9Or5vKRNxaU/edit” in this channel at 9:30AM every other Monday (next occurrence is March 17th), Mountain Daylight Time.
set up a reminder “Image-Builder office hours start in 1 hour. Agenda: https://docs.google.com/document/d/1YIOD0Nnid_0h6rKlDxcbfJaoIRNO6mQd9Or5vKRNxaU/edit” in this channel at 9:30AM every other Monday (next occurrence is today), Mountain Daylight Time.
set up a reminder “Image-Builder office hours start in 1 hour. Agenda: https://docs.google.com/document/d/1YIOD0Nnid_0h6rKlDxcbfJaoIRNO6mQd9Or5vKRNxaU/edit” in this channel at 8:30AM every other Monday (next occurrence is March 24th), Mountain Daylight Time.
Hello image-builder maintainers, I was trying to build GPU-ready OVAs by setting the vsphere ISO builder's pcipassthroughalloweddevice field. vSphere allows users to assign multiple PCI passthrough devices (GPU cards, video capture cards, audio cards, etc) to a virtual machine without specifying an exact physical device on a particular ESXi host. It does this through a feature called Dynamic DirectPath I/O, which requires virtual hardware version 17. However since we're hardcoding this to 15, I'm not able to get this feature working. This version hasn't been changed in 5 years, I think we should update this outdated version or remove it. From the Packer documentation, the vsphere builder's vmversion field, if not set, defaults to the most current virtual machine hardware version supported by the ESXi host. Kindly let me know your thoughts on this.
We have overriden the hardware version to 18 for windows 2019 and windows 2022, but only because the corresponding guest OS types were not supported in hardware version 15. Maybe we should bump the default to a newer version while still allowing overriding for backward compatibilty with older vCenter versions?
It sounds like the unset default might actually be best if I understand correctly. That would then always use the latest version available in the environment you’re building the image, yes?
We should still allow it to be set if needed for backward compatibility though.
I haven't tried the unset default route myself and for the time being, am resorting to overriding vmx_version in the image-specific packer file (rhel-8 OVA.json for example), but based on the documentation, it seems like it should work
If theres no objections from others I'd be in favour of that. But I don't have all that much knowledge of OVA so maybe worth someone else weighing in first.
In case you are using capv: you should be able to overwrite the hardware version: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/main/apis/v1beta1/types.go#L205
Hello everyone!
I came across a need that I couldn't find in Ansible for the Image Builder.
I need to set some kernel parameters for the OS via sysctl. These parameters must persist after a reboot (and should also prevent users from having to define them through init containers, for example).
What would be the most viable solution for this?
Should I create a new role that copies the parameters to /etc/sysctl.d/98-custom.conf?
Or would it be better to have a generic role that copies any local file to a specified path in the image? This way, we could also solve future issues related to custom files that need to be added to the image.
Depending on our discussion here, I can work on this task
My suggestion would be to use nodecustomroles_post and supply your own custom ansible role.
If it turns out there are others interested in similar then we can port it upstream into image-builder but as of right now I haven’t seen anyone looking for similar that wasn’t very specific so might be best with the custom role. 🙂
Very fair. I had a feeling that something to solve this kind of problem had already been considered.
I'll be skipping today. I have a bunch of things I'd like to get ready for KubeCon next week 😅
It looks like there isn't anything in the agenda anyway - shall we skip altogether?
Hello Image-builder maintainers, when trying to build RHEL 9 OVAs in CI, we ran into this error for all Kubernetes versions
Build 'vsphere-iso.vsphere' errored after 3 minutes 29 seconds: No host is compatible with the virtual machine.After some digging, I discovered this is because the RHEL 9 Guest OS type rhel9_64Guest was introduced in vSphere API release 7.0.1.0 (source), which is compatible with virtual machine hardware version 18 (source). But image-builder hardcodes the VM hardware version to 15, which doesn't support RHEL 9. We worked around this on our end by patching image-builder after which the builds succeeded. I'm upstreaming the patch in this PR. Kindly take a look and let me know if there are any concerns. Thank you!
I'm also curious if others ran into this, I'd be surprised if they didn't thinking
check-success
Sorry for the delay. Currently at KubeCon and busy busy busy 😆
Ah sorry for the untimely ping! Really appreciate you reviewing and merging the PR despite your busy schedule! ty
Nah don't worry about it 🙂 Always appreciate the contributions!
Nothing on the agenda, but I'm happy to have office hours if anyone has anything to talk about. Speak up in the next few minutes if so... :-)
Ok, let's skip today.
Please add to the agenda for next time if you have something to discuss or present, or ask your questions here in the Slack channel. 😄
Sorry, I've been stuck in meetings and didn't even see the pop-ups for slack 🤦♂️
Brains still recovering from 12.5k people over 5 days 🤣🤣
:face_palm: sorry, I was out yesterday for my friends birthday and totally forgot shot this.
Anyone else no longer have access to the agenda? Or is it just me for some reason?
Also, if possible I’d rather we skip today but can make it if needed
Yeah I'm being prompted to request access. But yes, bank holiday weekend in the UK, so won't be available myself.
I have the same problem--I also lost access to another SCL Google Doc at the same time.
But I have access to CAPI and CAPZ docs, so it's not a matter of me not having the right perms. I think we need the actual document owner to refresh its permissions so Kubernetes in general can see it.
I don't--maybe @jsturtevant does? Or hopefully someone in sig-cluster-lifecycle leadership, I can ask Fabrizio.
I had a tab still open with the agenda to I managed to grab a backup of it before reloading.
You can view it here:
It's currently set to comment-only as I don't want this to become the new agenda as it's tied to Giant Swarm but I wanted to make sure we didn't lose the history 🙂
I think Fabrizio is an owner, he said he could help us move it to the appropriate k8s document area where apparently it should have been to avoid this problem. Crossing fingers.
Hello everyone,
We’ve encountered an issue while using kubernetes-sigs/image-builder to build Red Hat-based node images for Cluster API (CAPI) workload clusters.
After provisioning a workload cluster with these images, we’ve noticed that /etc/resolv.conf on the nodes includes two unexpected nameserver entries . These are not defined in our bootstrap data, cloud-init config, or Image Builder templates.
Example output of resolve.conf:
; Created by cloud-init automatically, do not edit.
Generated by NetworkManager
search foo.bar local
nameserver 10.x.x.1
nameserver 10.x.x.2
These additional nameservers are not configured by the user and their origin is unknown.
Image-builder v0.1.42 is now available:
Thanks to all contributors! 💙
Another bank holiday in the UK and I'm away so I'm not going to be around for this I'm afraid. Sorry!
But it also seems like the agenda doc isn’t sorted yet(?) so no items anyway I guess
Hello all,
I've made a feature request regarding the support of ARM64 azure VM template build:
I've described the list of detected blockers based on what I've experienced on my side.
As said in the issue I'm not confortable to make a PR for this due to the lack of knowing on how you would like to see this feature handled.
Trying to add a single dedicated makefile target (i.e. sig ubuntu 22.04 ARM64) seems to cause a lot of duplicated code, and on the other hand, trying to add the notion of processor architecture more globaly needs to make a lot of modifications on things I don't know enough and I'm not able to test.
I'm open to feedback and will be happy to help where I can to see this feature supported 🙂.
Hello Everyone,
I'm understanding that Ubuntu Latest Versions (22.04 and 24.04) do not supports preseeds anymore, so it's not referenced over here .
How can I configure the preseed - to add changes to cloudinit / auto-install file ? I tried updating base/preseed-efi.cfg.tmpl and 22.04.efi user-data.tmpl but I don't see it has reflected. Also in the logs on trying to build make build-raw-ubuntu-2204-efi I can see multiple preseeds and user-data (cloud inits) being logged. What's the order of precedence
Could someone please guide on this ?
TIA
@Marcus Noble / @mboersma Can you please take a look at PR. This adds ability for user to specify ansible roles to be run post the sysprep stage. Currently there are no hooks which allow user to run custom playbooks/roles after all the stages(including sysprep but before goss validation runs).
cc: @rajas
@Marcus Noble / @mboersma Can you please have a look at the above PR?
Both me and Matt are currently away this week. Is it ok if we get to it next week when we’re back?
@Marcus Noble Let me see if I can get someone else to look at these. Please do not bother your break. Sorry for pinging during your break(didn't realize that).
It’s totally fine. I’m actually away for work so I might get chance to look at it for you but it’s not guaranteed
(Did see this notification but not sure when I get back to it)
@Marcus Noble / @mboersma Once you are back, can you please have a look at the PR?
👍 I'm going to try and get to it today but I have a lot to catch up on so apologies if it takes me a while
LGTM. Assigned to the others for approval but if no one comes back by this afternoon I'm happy to merge it 🙂
Sorry, all, I'm back now. Thanks for being on top of things @Marcus Noble!
We’ll have to skip today’s office hours, as the active maintainers are either on break or at an offsite.
[Packer Build Failing with "ssh: handshake failed: EOF" on Ubuntu 24.04]
Hi everyone,
I’m encountering an issue during a Packer build where the Ansible provisioner fails with an SSH handshake error. The build consistently crashes at the same step, and I’d appreciate any insights.
Environment
==> qemu: ssh: handshake failed: EOFCurrent Configuration
2025/05/20 23:11:46 [ERROR] ssh session open error: 'EOF', attempting reconnect
{
"builders": [
{
"accelerator": "{{user accelerator}}",
"boot_command": [
"{{user boot_command_prefix}}",
"{{user boot_media_path}}",
"{{user boot_command_suffix}}"
],
"boot_wait": "{{user boot_wait}}",
"cd_files": [
"{{user cd_files}}"
],
"cd_label": "cidata",
"cpu_model": "host",
"cpus": "{{user cpus}}",
"disk_compression": "{{ user disk_compression}}",
"disk_discard": "{{user disk_discard}}",
"disk_image": "{{ user disk_image }}",
"disk_interface": "virtio-scsi",
"disk_size": "{{user disk_size}}",
"firmware": "{{user firmware}}",
"format": "{{user format}}",
"headless": "{{user headless}}",
"http_directory": "{{user http_directory}}",
"iso_checksum": "{{user iso_checksum_type}}:{{user iso_checksum}}",
"iso_url": "{{user iso_url}}",
"memory": "{{user memory}}",
"net_device": "virtio-net",
"output_directory": "{{user output_directory}}",
"qemu_binary": "{{user qemu_binary}}",
"shutdown_command": "echo '{{user ssh_password}}' | sudo -S -E sh -c 'usermod -L {{user ssh_username}} && {{user shutdown_command}}'",
"ssh_password": "{{user ssh_password}}",
"ssh_timeout": "2h",
"ssh_username": "{{user ssh_username}}",
"type": "qemu",
"vm_name": "{{user vm_name}}",
"vnc_bind_address": "{{user vnc_bind_address}}"
}
],
"post-processors": [
{
"environment_vars": [
"CUSTOM_POST_PROCESSOR={{user custom_post_processor}}"
],
"inline": [
"if [ \"$CUSTOM_POST_PROCESSOR\" != \"true\" ]; then exit 0; fi",
"{{user custom_post_processor_command}}"
],
"name": "custom-post-processor",
"type": "shell-local"
},
{
"environment_vars": [
"OUTPUT_DIR={{user output_directory}}",
"ARTIFACT_NAME={{user artifact_name}}",
"KUBEVIRT={{user kubevirt}}"
],
"inline": [
"if [ \"$KUBEVIRT\" != \"true\" ]; then",
"exit 0",
"else",
"bash ./packer/qemu/scripts/build_kubevirt_image.sh {{user build_name}}-container-disk",
"fi"
],
"name": "kubevirt",
"type": "shell-local"
}
],
"provisioners": [
{
"environment_vars": [
"PYPY_HTTP_SOURCE={{user pypy_http_source}}"
],
"execute_command": "BUILD_NAME={{user build_name}}; if [[ \"${BUILD_NAME}\" == \"flatcar\" ]]; then sudo {{.Vars}} -S -E bash '{{.Path}}'; fi",
"script": "./packer/files/flatcar/scripts/bootstrap-flatcar.sh",
"type": "shell"
},
{
"ansible_env_vars": [
"ANSIBLE_SSH_ARGS='{{user existing_ansible_ssh_args}} {{user ansible_common_ssh_args}}'",
"KUBEVIRT={{user kubevirt}}"
],
"extra_arguments": [
"--extra-vars",
"{{user ansible_common_vars}}",
"--extra-vars",
"{{user ansible_extra_vars}}",
"--extra-vars",
"{{user ansible_user_vars}}",
"--scp-extra-args",
"{{user ansible_scp_extra_args}}"
],
"playbook_file": "./ansible/firstboot.yml",
"type": "ansible",
"user": "builder"
},
{
"expect_disconnect": true,
"inline": [
"sudo reboot now"
],
"inline_shebang": "/bin/bash -e",
"type": "shell"
},
{
"ansible_env_vars": [
"ANSIBLE_SSH_ARGS='{{user existing_ansible_ssh_args}} {{user ansible_common_ssh_args}}'",
"KUBEVIRT={{user kubevirt}}"
],
"extra_arguments": [
"--extra-vars",
"{{user ansible_common_vars}}",
"--extra-vars",
"{{user ansible_extra_vars}}",
"--extra-vars",
"{{user ansible_user_vars}}",
"--scp-extra-args",
"{{user ansible_scp_extra_args}}"
],
"playbook_file": "./ansible/node.yml",
"type": "ansible",
"user": "builder"
},
{
"arch": "{{user goss_arch}}",
"format": "{{user goss_format}}",
"format_options": "{{user goss_format_options}}",
"goss_file": "{{user goss_entry_file}}",
"inspect": "{{user goss_inspect_mode}}",
"tests": [
"{{user goss_tests_dir}}"
],
"type": "goss",
"url": "{{user goss_url}}",
"use_sudo": true,
"vars_file": "{{user goss_vars_file}}",
"vars_inline": {
"ARCH": "amd64",
"OS": "{{user distro_name | lower}}",
"OS_VERSION": "{{user distribution_version | lower}}",
"PROVIDER": "qemu",
"containerd_version": "{{user containerd_version}}",
"kubernetes_cni_deb_version": "{{ user kubernetes_cni_deb_version }}",
"kubernetes_cni_rpm_version": "{{ split (user kubernetes_cni_rpm_version) \"-\" 0 }}",
"kubernetes_cni_source_type": "{{user kubernetes_cni_source_type}}",
"kubernetes_cni_version": "{{user kubernetes_cni_semver | replace \"v\" \"\" 1}}",
"kubernetes_deb_version": "{{ user kubernetes_deb_version }}",
"kubernetes_rpm_version": "{{ split (user kubernetes_rpm_version) \"-\" 0 }}",
"kubernetes_source_type": "{{user kubernetes_source_type}}",
"kubernetes_version": "{{user kubernetes_semver | replace \"v\" \"\" 1}}"
},
"version": "{{user goss_version}}"
}
],
"variables": {
"accelerator": "kvm",
"ansible_common_vars": "",
"ansible_extra_vars": "ansible_python_interpreter=/usr/bin/python3",
"ansible_user_vars": "",
"artifact_name": "{{user build_name}}-kube-{{user kubernetes_semver}}",
"boot_media_path": "http://{{ .HTTPIP }}:{{ .HTTPPort }}",
"boot_wait": "10s",
"build_timestamp": "{{timestamp}}",
"cd_files": "linux/base/**.nothing",
"containerd_sha256": null,
"containerd_url": " containerd_version}}/cri-containerd-cni-{{user containerd_version}}-linux-amd64.tar.gz",
"containerd_version": null,
"cpus": "1",
"crictl_url": " crictl_version}}/crictl-v{{user crictl_version}}-linux-amd64.tar.gz",
"crictl_version": null,
"disk_compression": "false",
"disk_discard": "unmap",
"disk_image": "false",
"disk_size": "20480",
"existing_ansible_ssh_args": "{{env ANSIBLE_SSH_ARGS}}",
"firmware": "",
"format": "qcow2",
"headless": "true",
"http_directory": "./packer/qemu/linux/{{user distro_name}}/http/",
"kubernetes_cni_deb_version": null,
"kubernetes_cni_http_source": null,
"kubernetes_cni_semver": null,
"kubernetes_cni_source_type": null,
"kubernetes_container_registry": null,
"kubernetes_deb_gpg_key": null,
"kubernetes_deb_repo": null,
"kubernetes_deb_version": null,
"kubernetes_http_source": null,
"kubernetes_load_additional_imgs": null,
"kubernetes_rpm_gpg_check": null,
"kubernetes_rpm_gpg_key": null,
"kubernetes_rpm_repo": null,
"kubernetes_rpm_version": null,
"kubernetes_semver": null,
"kubernetes_series": null,
"kubernetes_source_type": null,
"machine_id_mode": "444",
"memory": "2048",
"oem_id": "",
"output_directory": "./output/{{user build_name}}-kube-{{user kubernetes_semver}}",
"python_path": "",
"qemu_binary": "qemu-system-x86_64",
"ssh_password": "$SSH_PASSWORD",
"ssh_username": "builder",
"vm_name": "{{user build_name}}-kube-{{user kubernetes_semver}}",
"vnc_bind_address": "127.0.0.1"
}
}Debugging Steps Taken{
"boot_command_prefix": "clinux /casper/vmlinuz --- autoinstall ds='nocloud-net;s=http://{{ .HTTPIP }}:{{ .HTTPPort }}/24.04/'initrd /casper/initrdboot",
"build_name": "ubuntu-2404",
"distribution_version": "2404",
"distro_name": "ubuntu",
"guest_os_type": "ubuntu-64",
"iso_checksum": "d6dab0c3a657988501b4bd76f1297c053df710e06e0c3aece60dead24f270b4d",
"iso_checksum_type": "sha256",
"iso_url": "",
"os_display_name": "Ubuntu 24.04",
"shutdown_command": "shutdown -P now",
"unmount_iso": "true"
} Any tips would be greatly appreciated! I’m happy to provide more logs or test suggestions.
/builds/magalu-cloud-iaas/k8s/image-builder/images/capi/.local/bin/packer build -var-file="/builds/magalu-cloud-iaas/k8s/image-builder/images/capi/packer/config/kubernetes.json" -var-file="/builds/magalu-cloud-iaas/k8s/image-builder/images/capi/packer/config/cni.json" -var-file="/builds/magalu-cloud-iaas/k8s/image-builder/images/capi/packer/config/containerd.json" -var-file="/builds/magalu-cloud-iaas/k8s/image-builder/images/capi/packer/config/wasm-shims.json" -var-file="/builds/magalu-cloud-iaas/k8s/image-builder/images/capi/packer/config/ansible-args.json" -var-file="/builds/magalu-cloud-iaas/k8s/image-builder/images/capi/packer/config/goss-args.json" -var-file="/builds/magalu-cloud-iaas/k8s/image-builder/images/capi/packer/config/common.json" -var-file="/builds/magalu-cloud-iaas/k8s/image-builder/images/capi/packer/config/additional_components.json" -var-file="/builds/magalu-cloud-iaas/k8s/image-builder/images/capi/packer/config/ecr_credential_provider.json" -color=true -var-file="/builds/magalu-cloud-iaas/k8s/image-builder/images/capi/packer/qemu/qemu-ubuntu-2404.json" packer/qemu/packer.json
8401qemu: output will be in this color.
8402==> qemu: Retrieving ISO
8403==> qemu: Trying
8404==> qemu: Trying
8405==> qemu: Download failed context deadline exceeded
8406==> qemu: error downloading ISO: [context deadline exceeded]
8407Build 'qemu' errored after 30 minutes 515 milliseconds: error downloading ISO: [context deadline exceeded]
8408==> Wait completed after 30 minutes 515 milliseconds
8409==> Some builds didn't complete successfully and had errors:
8410--> qemu: error downloading ISO: [context deadline exceeded]
8411==> Builds finished but no artifacts were created.
8412make[2]: ****** [Makefile:560: build-qemu-ubuntu-2404] Error 1
8413make[2]: Leaving directory '/builds/magalu-cloud-iaas/k8s/image-builder/images/capi'
8414make[1]: ****** [Makefile:1245: mgc-build-image] Error 2
8415make[1]: Leaving directory '/builds/magalu-cloud-iaas/k8s/image-builder/images/capi'
8416make: ****** [Makefile:37: mgc-build-image] Error 2
8417
Cleaning up project directory and file based variables
00:01
8418ERROR: Job failed: command terminated with exit code 1
@Karine Santos Looking at the messages, this error seems related to the ISO download.
The first error indicates a timeout while downloading the ISO file.
Image-builder v0.1.44 is now available:
Thanks to all contributors! 💙
set the channel topic: Slack channel for the image-builder project: https://github.com/kubernetes-sigs/image-builder
Office Hours: https://docs.google.com/document/d/100uv2GmlgWyLBVP65W6ABNJ_EqbvVYTYtTilCLbnVYI/edit
set up a reminder “https://docs.google.com/document/d/100uv2GmlgWyLBVP65W6ABNJ_EqbvVYTYtTilCLbnVYI/edit” in this channel at 8:30AM every other Monday (next occurrence is June 16th), Mountain Daylight Time.
set up a reminder “Image-Builder office hours start in 1 hour. Agenda: https://docs.google.com/document/d/100uv2GmlgWyLBVP65W6ABNJ_EqbvVYTYtTilCLbnVYI/edit” in this channel at 8:30AM every other Monday (next occurrence is June 16th), Mountain Daylight Time.
@Marcus Noble / @mboersma PR needs your approvals for test and merger. The changeset primarily targets vSphere OVA builds.
cc: @palnabarun
It's on my list. Will try and get to it today but so far my day isn't going too well 😅
Is there any more context to this? There's no related issues nor any explanation why it's needed or what the impact would be.
Will create a feature request with the details and also update the PR with the context. In nutshell, this more wrt to improving the node performance.
@Marcus Noble Please do let me know if you need any further information.
Hey folks, can I get a review for ? This re-adds the presubmit vsphere/OVA CI job (as optional) again, based on the new community based infra.
I’m doing the testing and required changes to make it finally work in
Anybody aware of any known PR check-in failure /test pull-gcp-all? This is not related with my change Make kubelet starting as a windows service by zylxjtu · Pull Request #1752 · kubernetes-sigs/image-builder, so I'm wondering if there are any known (infra-related maybe?) issue? Thanks!
@Marcus Noble Hello 👋 I'm working with @Tomy Guichard at Scaleway with @Leïla MARABESE We can have a conversation about what you need for integrating image-builder with Scaleway 🙂
👋 Hey y'all!
I just read the comment on the PR 🙂 Really happy to see Scaleway is willing to at least contribute effort to supporting this new provider! 💙 From my perspective having y'all down as reviewers for this provider would be enough (for now) for me to be happy to add this provider to image-builder. (@mboersma @Drew Hudson-Viles @jsturtevant do y'all agree? 🙂)
If possible I would LOVE to see some infra support from Scaleway to support testing of PR as we're very lacking in that regard across image-builder. It would go a long way to making Scaleway a solid, reliable option in image-builder but I also know how difficult these kind of agreements can be so I'm staying realistic 🙂
On a side note - very happy to see the Scaleway provider officially supported 😄 I'm a Scaleway user myself although just a single cluster plus some other resources.
Yo! Just here to echo Marcus really. Anything that can be provided be it in terms of supporting the provider and, if possible of course, infra for testing the provider would be amazing. Welcome!
On a credentials sharing perspectives, what are the process on the CNCF side to share those kind of access? Are they organic and scope based, or is there within the CNCF a way for vendor to share this kind of resources ?
Scaleway organization supports projects and we can have several projects within an organization
I would need to check with CNCF. I'm not totally sure myself. Maybe @mboersma knows more but he's currently on vacation.
So I'm trying to envision what would be the best architecture for it, what would be the email address to invite in the organization, how to track credentials and have audit on who can access what
Maybe @bentheelder could share some insight on how providers go about donating cloud resources for testing 🙏
@chrischdi may be also be able to provide some insight here?
From my experience it is not only about sharing credentials. The infrastructure must be owned by the CNCF/Community if I know it right. So its more about donating credits or money so the community could leverage that. But I’m not the one who went through the process. Maybe best to ask in test-infra folks on what viable variants are.
But in our case (vSphere) it was different, because we still use a public cloud (in our case Google Cloud) for the infrastructure, and are not a cloud provider ourselves.
⚠️ Looks like this Slack workspace if moving to the free tier this week 😳 Which if I understand correctly means we'll lose the history older than 90 days in this channel.
If there is anything from the history in this channel that you refer to please mention it in this thread and I'll try and get it backed up somewhere more permanent (GitHub or similar).
Changes to Kubernetes Slack | Kubernetes Contributors
We're seeing some GCP failures in our PR tests, does anyone know / can confirm for me if the official Ubuntu 20.04 base image is no longer available on GCP now that support has ended?
googlecompute.ubuntu-2004: Error getting source image for instance creation: Could not find image, ubuntu-2004-lts, in projects
Unless anyone shouts at me not to, I propose removing Ubuntu 20.04 from GCP as it's EOL anyway.
I’m not able to make it today but I have added some points to the agenda
I'll be there, hopefully others can join.
I might actually be able to going from my phone in about 5 min
Slight correction to what I said @mboersma - the public channels are archived but they’re offline and not currently searchable so we can’t reference them. (https://github.com/kubernetes/community/blob/master/communication/slack-migration-faq.md#what-information-will-we-lose)
Sorry, my bad. Had to set off to pick Ada up from nursery early today and forgot to drop a message in here. Today has flown by so time got away with me and before I knew it I was heading out and already this is done 🤦♂️🤦♂️